Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-proposal: websocket token authentication with subprotocols #119

Open
minrk opened this issue Mar 26, 2024 · 7 comments
Open

Pre-proposal: websocket token authentication with subprotocols #119

minrk opened this issue Mar 26, 2024 · 7 comments

Comments

@minrk
Copy link
Member

minrk commented Mar 26, 2024

Background

Combining these facts:

  • Jupyter Servers (often) use API tokens for authentication
  • auth tokens are typically passed in the Authorization header
  • websockets cannot pass tokens in Authorization headers because the browser implementation forbids it, though most implementations outside the browser do support it.
    As a result, websocket requests must pass tokens in a URL parameter.
  • Passing tokens in a URL parameter is generally frowned upon, but not strictly insecure and indeed explicitly recommended by Browser websocket implementers

motivates having a new mechanism by which to pass the auth token for websocket requests that's not in the URL.

Proposal

Add support for passing authentication tokens in the websocket subprotocol for Jupyter Server, to avoid tokens in URL parameters.

This is a scheme devised by Kubernetes, where the subprotocols API allows specifying the Sec-Websocket-Protocol header, and we can put the token in there.

Example implementation

(draft implementation of this at jupyter-server/jupyter_server#1407)

To authenticate websockets via new mechanism, clients should:

ws = new WebSocket(wss://..., ['v1.token.websocket.jupyter.org', `v1.token.websocket.jupyter.org.${token}`, ...])

which sets the header:

Sec-WebSocket-Protocol: v1.token.websocket.jupyter.org, v1.token.websocket.jupyter.org.abc123

The response will have the header:

Sec-WebSocket-Protocol: v1.token.websocket.jupyter.org

The reason for the double subprotocol is that if any subprotocol is requested, the response must include one of the requested subprotocols for the connection to be accepted by all browsers. Not all browser require this, but Chrome does. The v1.token.websocket.jupyter.org serves no purpose if there is already a subprotocol defined and required, and should be optional.

Remaining questions

The main implementation question is how should frontends negotiate/discover this feature. We have already done it once in jupyterlab/jupyterlab#11841 adding the kernel subprotocol, and the solution was to make no assumptions and try once with subprotocols, and then reconnect without them if it fails. This becomes a little bit more complicated when there are two subprotocols that may not be implemented, but the same strategy is valid and requires no prior knowledge of the server's support for the new scheme.

Why a JEP

This is a JEP because it affects a relatively low-level feature in the Jupyer Server that clients (JupyterLab) should implement. It is fully backward-compatible, so perhaps it doesn't need to be, but it does involve implementation in both jupyter-server and jupyterlab. I have a proposed implementation already. I'd be happy for this to just be a new feature in Jupyter Server that future frontends can implement as needed. There is no plan to deprecate the existing behavior, so there is relatively little pressure to implement by frontends, as nothing can be broken by this.

There are also multiple possible ways to go about this, such as a handshake message, which has a lot of disadvantages, as I see it (backward-compatiblity being much harder, for one).

Affected subprojects:

  • jupyter server implementations (jupyter-server, jupyverse)
  • jupyter server clients (JupyterLab, maybe some server proxies like enterprise gateway?)

Recommended reviewers:

  1. from Jupyter Server (e.g. @Zsailer, @davidbrochart)
  2. from frontends (@jtpio)
  3. from security (@rpwagner)
@minrk minrk changed the title Pre-proposal: websocket authentication with subprotocols Pre-proposal: websocket token authentication with subprotocols Mar 26, 2024
@gabalafou
Copy link
Contributor

@minrk thanks for this detailed, cross-referenced write-up.

This all makes sense but I was wondering if you could elaborate a bit more on why we're rejecting alternative approaches.

The reason I ask is that right now if someone looks at this proposal as written, it kinda looks like the justification for sending the token via WebSocket subprotocols is: "Kubernetes is doing it." But what Kubernetes is doing also looks like a hack and misuse of the subprotocol header. So I think it would be nice if we could expand our justification for adopting this approach, perhaps by going through the alternatives and explaining in detail why they are worse than the proposed approach.

For example, I guess we can't use cookies because we must support scenarios where the browser app is served from a server on one hostname but the WebSocket connection needs to go to a server at a different hostname.

I think it would be good to get all of this down in writing somewhere.

@minrk
Copy link
Member Author

minrk commented Apr 2, 2024

Fair points! I was planning to do that if/when this becomes an actual JEP, but I can do it here, too. I think the main reason is that the transition to message-based handshaking is quite a bit more complex to do in a graceful way.

I guess we can't use cookies because we must support scenarios where the browser app is served from a server on one hostname but the WebSocket connection needs to go to a server at a different hostname.

Correct, we can't use just cookies (we can and do use cookies in many cases). We have a requirement that delivering a token should be accepted in cases like:

  • CORS requests from a browser (e.g. thebe)
  • non-browser scripts

for which the only channels we have available to us to pass credentials are:

  1. the URL
  2. messages after the server accepts the connection
  3. the Sec-Websocket-Protocol header

An Authorization header would be standard and ideal and already accepted by Jupyter Server, but browser implementers have made the dubious decision to exclude this option from the javascript API.

Right now, we are doing 1. in these cases, which is frankly okay as long endpoints are over https or localhost, but not ideal. The only downside is that query parameters may appear in logs, which is why it's identified as a CWE.

Option 2. can also work, and does not require stuffing credentials in the subprotocols header. The downsides are mainly that it would be a more complex transition for clients and more complex implementation for servers.

For servers:

  • The server would need to accept all connections without authentication and implement timeout logic to close connections that never authenticate (a potential Denial-of-Service issue, but not insurmountable)
  • The 'on message' behavior on the server becomes stateful because the first message must be authentication and then enter the current state
  • If we want to accept both the old and new handshake (and we should, at least for a long transition period), it is even more complex. One way could be to accept them at different URLs.

For clients:

  • Sending the new handshake to a server that doesn't recognize it would result in errors
  • There is no clear mechanism to 'try' the new handshake and fallback on the old
  • The client Websocket also needs to be more stateful in terms of changing onmessage/etc. handlers based on phases

The subprotocols strategy is easier for both clients and servers because:

  • the negotiation happens in the connection stage, just like now; there is no 'open but not yet authenticated' state
  • backward-compatibility on the server side requires no real work at all
  • backward-compatibility on the client side only requires trying with the subprotocol, then retrying failed connections without, which we already do for the kernels subprotocol (it may become slightly more complex to negotiate both optional subprotocols).

Both approaches are definitely doable.

Another question to resolve is whether/how servers signal that they support the new handshake. For the kernels subprotocol, they simply don't and clients have to try connecting with and reconnecting without, which has worked fine, to my knowledge. If we declare a mechanism to unambiguously signal support on the server side, clients can be simplified slightly in that the discovery of which handshake to use can happen explicitly first, but both implementations must still be present for compatibility.

@minrk
Copy link
Member Author

minrk commented Apr 2, 2024

A shorter summary of the comparison is that I thought for years that the transition to message handshakes was the only way and it made me too tired to think about implementing properly, so I never proposed it. Only when I saw the kubernetes approach did I think it was sufficiently simple to actually write down this proposal for something that's been a known issue since before this project was called Jupyter.

@gabalafou
Copy link
Contributor

gabalafou commented Apr 15, 2024

Thank you! That's exactly what I was asking for and very helpful. Your last message about feasibility really captures for me why your proposed way forward is the only one that makes sense given the constraints.

One thing that came up in conversation at the SSC meeting today is a concern about auth token size and any potential size limitation on the header. Do we have constraints on token size? For example, is it possible for a Jupyter deployment to have tokens that somehow change based on the user's username and if the user has a very long username, could that token get long enough to hit up against some size limitation on the headers?

Should we save that question and others for a JEP?

@minrk
Copy link
Member Author

minrk commented Apr 16, 2024

we don't have our own constraints, and there will likely be deployments where the token is a JWT, which can technically get arbitrarily long, but generally not anywhere close to the general header limit, which is configurable but tends to be around 8kB.

But I don't think this introduces any related issues, because we are already passing this same token in the Authorization header for all non-websocket API requests. So if token size is an issue, it would be for all API requests, not unique to websockets. Passing this via Sec-WebSocket-Protocol is only a few more bytes than passing it via Authorization.

@JohanMabille
Copy link
Member

Sorry for this late reply, thanks @minrk for this very detailed pre-proposal. Even if it is backward compatible and does not break anything, I think it would be worth having a JEP, at least for documentation purpose for maintainers of the affected projects. I'll let other people confirm it, but it was the general feeling during the last SSC meeting.

@minrk
Copy link
Member Author

minrk commented May 1, 2024

JEP is open: #121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants