Skip to content

Fix ZeroGPU handling for gr.Server#13210

Open
abidlabs wants to merge 9 commits intomainfrom
server-zerogpu
Open

Fix ZeroGPU handling for gr.Server#13210
abidlabs wants to merge 9 commits intomainfrom
server-zerogpu

Conversation

@abidlabs
Copy link
Copy Markdown
Member

@abidlabs abidlabs commented Apr 7, 2026

Fixes: #13209

@gradio-pr-bot
Copy link
Copy Markdown
Collaborator

gradio-pr-bot commented Apr 7, 2026

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview
Website ready! Website preview
Storybook ready! Storybook preview
🦄 Changes detected! Details

Install Gradio from this PR

pip install https://huggingface.co/buckets/gradio/pypi-previews/resolve/1f0477a9c866650ee45775c1dda0c33bedf91cff/gradio-6.11.0-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@1f0477a9c866650ee45775c1dda0c33bedf91cff#subdirectory=client/python"

Install Gradio JS Client from this PR

npm install https://gradio-npm-previews.s3.amazonaws.com/1f0477a9c866650ee45775c1dda0c33bedf91cff/gradio-client-2.1.0.tgz

@gradio-pr-bot
Copy link
Copy Markdown
Collaborator

gradio-pr-bot commented Apr 7, 2026

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
@gradio/client patch
@self/app patch
@self/spa patch
gradio patch

  • Fix ZeroGPU handling for gr.Server

‼️ Changeset not approved. Ensure the version bump is appropriate for all packages before approving.

  • Maintainers can approve the changeset by checking this checkbox.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@pngwn
Copy link
Copy Markdown
Member

pngwn commented Apr 7, 2026

this is a bit complicated, why don't we just move the handshake logic to the JS client? Its kinda weird that it is split up anyway.

@abidlabs
Copy link
Copy Markdown
Member Author

abidlabs commented Apr 7, 2026

Yes I was just looking into that possibility @pngwn!

@pngwn
Copy link
Copy Markdown
Member

pngwn commented Apr 7, 2026

@abidlabs
Copy link
Copy Markdown
Member Author

abidlabs commented Apr 7, 2026

Does this approach look reasonable @pngwn? Going to test it to confirm

@pngwn
Copy link
Copy Markdown
Member

pngwn commented Apr 7, 2026

Yep looks good! The module level variable is fine because we only need to go through this once, even if there are multiple client instances.

@freddyaboulton
Copy link
Copy Markdown
Collaborator

Thanks @pngwn + @abidlabs 🙏

@abidlabs
Copy link
Copy Markdown
Member Author

abidlabs commented Apr 7, 2026

Was trying to test this, but the npm package is published as a tarball:

npm install https://gradio-npm-previews.s3.amazonaws.com/0fcac1fdde618203b0d9c3cb7828163d52aab1ec/gradio-client-2.1.0.tgz

@pngwn is there a simple way I can test this in a Space? I want to do something like this:

from gradio import Server
from fastapi.responses import HTMLResponse

app = Server()

@app.mcp.tool(name="add")
@app.api(name="add")
def add(a: int, b: int) -> int:
    """Add two numbers together."""
    return a + b

@app.mcp.tool(name="multiply")
@app.api(name="multiply")
def multiply(a: int, b: int) -> int:
    """Multiply two numbers together."""
    return a * b

@app.get("/", response_class=HTMLResponse)
async def homepage():
    return """
<!DOCTYPE html>
<html>
<head><title>Calculator</title>
<style>
  * { margin: 0; box-sizing: border-box; font-family: 'Courier New', monospace; }
  body { min-height: 100vh; display: flex; align-items: center; justify-content: center; background: #1a1a2e; color: #fff;}
  .calc { background: #16213e; padding: 2rem; border-radius: 1rem; box-shadow: 0 8px 32px rgba(0,0,0,.4); width: 320px; }
  #out { background: #0f3460; color: #0f0; font-size: 2rem; text-align: right; padding: .75rem 1rem; border-radius: .5rem; min-height: 3rem; margin-bottom: 1rem; }
  .row { display: flex; gap: .5rem; margin-bottom: .5rem; }
  input { flex: 1; min-width: 0; padding: .6rem; font-size: 1.2rem; border: none; border-radius: .5rem; background: #e2e2e2; text-align: center; }
  button { flex: 1; padding: .6rem; font-size: 1rem; border: none; border-radius: .5rem; cursor: pointer; font-weight: bold; color: #fff; }
  .add { background: #e94560; } .mul { background: #533483; }
  button:hover { opacity: .85; }
</style></head>
<body>
  <div class="calc">
    <div id="out">0</div>
    Operands
    <div class="row"><input id="a" type="number" value="3"><input id="b" type="number" value="5"></div>
    Operation
    <div class="row"><button class="add" onclick="run('add')">+</button><button class="mul" onclick="run('multiply')">&times;</button></div>
  </div>
  <script type="module">
    import { client } from "https://cdn.jsdelivr.net/npm/@gradio/client/dist/index.min.js";
    const app = await client(location.origin);
    window.run = async (ep) => {
      const a = parseInt(document.getElementById("a").value), b = parseInt(document.getElementById("b").value);
      document.getElementById("out").textContent = (await app.predict("/" + ep, { a, b })).data;
    };
  </script>
</body>
</html>"""

if __name__ == "__main__":
    app.launch(mcp_server=True)

but using the Client from this .tgz URL

@abidlabs abidlabs marked this pull request as ready for review April 8, 2026 18:36
@abidlabs
Copy link
Copy Markdown
Member Author

abidlabs commented Apr 8, 2026

Ok so I ended up just rebuilding it on the Space, there might be a better way, but this seems to work as it uses up by ZeroGPU quota correctly:

image

https://huggingface.co/spaces/abidlabs/zgpt2

cc @gary149 @yvrjsharma for visibility

} from "./constants";
declare const BROWSER_BUILD: boolean;

initialize_zerogpu_handshake();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this happening here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where else could it go? I put it module init because it needs to finish before the first user request. Otherwise that first request can race the iframe auth handshake and not include the forwarded headers.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it'd go in submit.ts or the client init. But let me know if I'm misunderstanding!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should put it in submit.ts because either we'd have to block until we have the headers or it might not get the headers quickly enough. We could do it in Client init I think (happy to move it there), but this is the closest to what we had previously (when we had it happen as part of the page initialization itself)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's keep as is then! But it's also currently being called in submit.ts. Is it intentional?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems unnecessary, will remove

Copy link
Copy Markdown
Collaborator

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @abidlabs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gr.Server custom frontends don't get proper ZeroGPU quota — missing x-ip-token handshake

4 participants