Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Transmit Cell Metadata #70

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

jflam
Copy link

@jflam jflam commented Jun 12, 2021

Transmit Cell Metadata

Hello!

We (@willingc and @MSeal) would like to propose a new JEP that describes transmitting cell metadata to kernels. This PR contains the JEP proposal that we originally discussed in the Pre-Proposal: Transmit Cell Metadata Issue.

Based on the discussion in that issue, I've updated the JEP to reflect the desire to send the cell metadata as part of the message header vs. the original proposal which argued for sending it in the content.

@minrk has graciously volunteered to shepherd this through the JEP process. Looking forward to continuing the discussion here!

Thx!
-John, Carol, Matt

Resolve #68

@choldgraf
Copy link
Contributor

@jflam just FYI the failing test is because Jupyter Book updated its spec for the TOC, and we haven't pinned the versions in this repository. I'm updating the toc structure here: #71

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Based on #68, what I was thinking was to use the existing message.metadata field and add a metadata.cell_metadata (or metadata.cell) field there, rather than duplicating metadata into the header.

zz-cell-metadata/transmit-cell-metadata.md Show resolved Hide resolved
zz-cell-metadata/transmit-cell-metadata.md Outdated Show resolved Hide resolved
to transmit the cell metadata for the executed cell.

In cases where Jupyter extensions generate their own metadata, that keys for
the metadata should be namespaced using an extension-specific prefix. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be a good idea to recommend a sub-dictionary as well for this. There's no requirement (or even recommendation) that metadata be a one-level dict. If an extension has more than one or two settings, it probably makes sense to nest a dict.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea; do you think that existing implementations might make the incorrect assumption that there aren't nested dicts, i.e., this change would break existing implementations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in the python layer all of the library interactions within jupyter / nteract don't expect single layer dicts. I'd be surprised if the UIs did either since there's already multi-layer dicts present in nbformat fields.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be nothing that prevents transmission (and a space-limited journal; storage) of nested dicts i.e. with schema: JSONschema-validateable [nested] JSON and/or W3C SHACL-validateable JSON-LD with URIs for migrateable Linked Data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any risk associated with it. Dicts are generally nested throughout Jupyter and the only spec for metadata is that it's a JSONable dict, which can have aribtrary depth, just like is already true for header, content, etc. which can also have nested fields.

zz-cell-metadata/transmit-cell-metadata.md Outdated Show resolved Hide resolved
@willingc
Copy link
Member

willingc commented Jul 6, 2021

Hi folks,

This PEP has been open for 3 weeks. We'll be collecting feedback for another week. On Monday July 12, we will request a vote.

Thanks!

Carol

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jep-70-transmit-cell-metadata/9877/1

Copy link
Member

@jasongrout jasongrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, I thought this proposal was adding a new field to the message protocol, but a closer reading indicates this understanding was wrong - it is using the existing metadata field all messages have. Is that correct?

From my understanding, thinking as a person that will need to implement this JEP and exactly what would need to be done, what this proposal is proposing is the following. All Jupyter clients that are executing code from the notebook format (i.e., there is a notion of "cell" and "cell metadata" in the client), when they are working with a "cell", must send all cell metadata from the notebook format to the kernel in any execute_request, inspect_request, or complete_request messages. This cell metadata is merged into the top-level metadata dictionary of the above message. Is this a correct understanding? Is there anything else that would need to be implemented if this JEP is passed?

In other words, since the metadata field already exists in these messages, a frontend can already transmit this information if it wishes, so presumably a frontend and kernel can coordinate on this metadata already. Is this JEP really about mandating that the metadata is transmitted from the frontend?

"header" : {
"msg_id": "...",
"msg_type": "...",
"metadata": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Reference-level explanation below, the metadata is not in the header. I think it should not be in the header, since the header will be transmitted back and forth many times (since parent_header is a copy of the original message's header). Can this example be changed to conform to the reference below, to move the metadata to the top level of the message rather than in the header?

zz-cell-metadata/transmit-cell-metadata.md Outdated Show resolved Hide resolved
@jasongrout
Copy link
Member

Another few questions trying to understand the scope of this proposal:

  1. Is there anything that frontends without the notion of a "cell" or "cell metadata" do? For example, the ipython console or JupyterLab Code Console?

  2. What about comm messages? Often ipywidgets sends comm messages that are sort of triggered from a cell - do they need to send cell metadata? What about debug messages?

  3. Right now, as the JEP says:

    The attributes within the metadata today are: A) small in size and B) not harmful to send across the wire so keeping the solution simpler was the preferred pattern in the proposal.

    Of course, we don't know what custom extensions are putting in cell metadata today, and it seems limiting to make these assumptions about all cell metadata for the future as well. I would strongly suggest a specific place for cell metadata that is meant to be sent to the kernel, rather than always sending everything. For example, we could have a new top-level dictionary for cell metadata to be sent to the kernel. Or perhaps a white-list of fields to send can be in the notebook metadata, so extensions can add metadata and tell the frontend to send it?

  4. One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?

@jasongrout
Copy link
Member

jasongrout commented Jul 7, 2021

4. One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?

I agree with Min's thoughts here, that we should separate cell metadata from other types of metadata that is transmitted.

@jasongrout
Copy link
Member

2. What about comm messages? Often ipywidgets sends comm messages that are sort of triggered from a cell - do they need to send cell metadata? What about debug messages?

Min's thoughts here indicate we should think about this as a generic way of associating cell metadata with any message, so the answers to the above would generally be "yes", i.e., if the message is closely associated with a cell, the relevant metadata should be transmitted.

@jasongrout
Copy link
Member

In other words, since the metadata field already exists in these messages, a frontend can already transmit this information if it wishes, so presumably a frontend and kernel can coordinate on this metadata already. Is this JEP really about mandating that the metadata is transmitted from the frontend?

I think it would be very clarifying to see exactly what changes are going to be made to the message spec document as part of this JEP, e.g., are there new prescribed fields in the message metadata dict, what their purpose is, etc.

@minrk
Copy link
Member

minrk commented Jul 8, 2021

@jasongrout thanks for the review!

Yes, I think we should specify that cell metadata SHOULD be transmitted as-is in message.metadata.cell for any message reasonably associated with a cell (rather than directly at top-level message.metadata so there's room for e.g. client metadata or others without collisions).

Co-authored-by: Jason Grout <jasongrout@users.noreply.github.com>
@jflam
Copy link
Author

jflam commented Jul 9, 2021

Thanks for the thoughtful comments, @jasongrout!

One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?

I agree with Min's thoughts here, that we should separate cell metadata from other types of metadata that is transmitted.

I think you've done a great job at identifying a key tension in this JEP: cell metadata vs. client metadata. Looking at Min's comments from above, what do folks think about having two different fields in the message metadata, i.e., message.metadata.cell and message.metadata.client? This would make the intent clearer at the cost of additional complexity. There are other optimizations possible such as transmitting client metadata only if an actual change was initiated by the client, for example.

Is there anything that frontends without the notion of a "cell" or "cell metadata" do? For example, the ipython console or JupyterLab Code Console?

One of the primary scenarios for transmitting cell metadata is identifying the programming language in an execute_request message. In this case, I wonder if there isn't a list of reserved/standard messages that need to be there so that clients like the ones you describe above will know what message to transmit to the kernel, rather than that being an implicit contract between a front end and the kernel?

Right now, as the JEP says:

The attributes within the metadata today are: A) small in size and B) not harmful to send across the wire so keeping the solution simpler was the preferred pattern in the proposal.

Of course, we don't know what custom extensions are putting in cell metadata today, and it seems limiting to make these assumptions about all cell metadata for the future as well. I would strongly suggest a specific place for cell metadata that is meant to be sent to the kernel, rather than always sending everything. For example, we could have a new top-level dictionary for cell metadata to be sent to the kernel. Or perhaps a white-list of fields to send can be in the notebook metadata, so extensions can add metadata and tell the frontend to send it?

Creating an allow-list in this manner, allowing the notebook to specify it is a different take than what we originally rejected with the kernel being the source of truth. It does have the downside of additional complexity in implementation. Thinking out loud: if there is an explosion in the size of cell metadata down the road, we could mandate that requirement in a future JEP as an optimization that clients would be eager to implement. This would give us a simpler implementation for this JEP and give us time to gather evidence for the more complex implementation. What do you think?

@jasongrout
Copy link
Member

jasongrout commented Jul 10, 2021

One of the primary scenarios for transmitting cell metadata is identifying the programming language in an execute_request message.

Here's a thought: instead of transmitting all cell metadata, or even a language name, how about we take a cue from our existing mechanism for identifying rich output formats, and instead transmit a mimetype field that identifies the format of code being transmitted? The mimetype of a typical python execute request could be text/x-python, for example. This naturally opens the door for other kinds of input to even single-language kernels, for example a special mimetype that represents a JSON datastructure for setting values in a language runtime, or other mimetypes that represent a chunk of binary data you are sending to the kernel (like an image or parquet file). Perhaps a sql kernel usually has text sql input, but one of the inputs could be a JSON datastructure containing info about what database to connect to (with a mimetype specific to that format) - that db info is sent in an execute_request with a mimetype of application/vnd.sqlkernel.dbinfo+json.

I realize it's basically the same thing as a language field - a string that identifies to the kernel how to handle the execution/completion request. I usually think of a kernel as having a single "language" (this is implicit in the kernel spec, where the kernel has a "language" identified), but it seems okay to say that a given kernel can receive and act on multiple types of inputs.

@jflam
Copy link
Author

jflam commented Jul 12, 2021

Yes- using a MIME type to identify a language would make perfect sense! In the prior art section of the JEP, I do mention a few other "meta-kernels" that exist as well that would benefit from this mechanism.

I do like the examples that you give, and it seems like some of those would be metadata that is not likely to be persisted in the .ipynb itself (e.g., sending chunk of binary data) and some that are likely to be persisted, (e.g., setting values in the language runtime - of course depends on what those values represent ...). I wonder if there's a more general concept "runtime metadata?" where some of the metadata comes from cell metadata and others are synthesized by the front end/extensions at runtime?

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/propagating-user-environment-information-to-ipython-kernels-and-more-generally/10931/1

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/notebook-cell-type-generaliastion/10703/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-Proposal: Transmit Cell Metadata
9 participants