Proposal: Transmit Cell Metadata #70

jflam · 2021-06-12T04:20:58Z

Transmit Cell Metadata

Hello!

We (@willingc and @MSeal) would like to propose a new JEP that describes transmitting cell metadata to kernels. This PR contains the JEP proposal that we originally discussed in the Pre-Proposal: Transmit Cell Metadata Issue.

Based on the discussion in that issue, I've updated the JEP to reflect the desire to send the cell metadata as part of the message header vs. the original proposal which argued for sending it in the content.

@minrk has graciously volunteered to shepherd this through the JEP process. Looking forward to continuing the discussion here!

Thx!
-John, Carol, Matt

Resolve #68

choldgraf · 2021-06-12T18:20:00Z

@jflam just FYI the failing test is because Jupyter Book updated its spec for the TOC, and we haven't pinned the versions in this repository. I'm updating the toc structure here: #71

minrk

Great! Based on #68, what I was thinking was to use the existing message.metadata field and add a metadata.cell_metadata (or metadata.cell) field there, rather than duplicating metadata into the header.

zz-cell-metadata/transmit-cell-metadata.md

minrk · 2021-06-14T06:07:13Z

zz-cell-metadata/transmit-cell-metadata.md

+to transmit the cell metadata for the executed cell. 
+
+In cases where Jupyter extensions generate their own metadata, that keys for
+the metadata should be namespaced using an extension-specific prefix. The


It would probably be a good idea to recommend a sub-dictionary as well for this. There's no requirement (or even recommendation) that metadata be a one-level dict. If an extension has more than one or two settings, it probably makes sense to nest a dict.

I like this idea; do you think that existing implementations might make the incorrect assumption that there aren't nested dicts, i.e., this change would break existing implementations?

At least in the python layer all of the library interactions within jupyter / nteract don't expect single layer dicts. I'd be surprised if the UIs did either since there's already multi-layer dicts present in nbformat fields.

There should be nothing that prevents transmission (and a space-limited journal; storage) of nested dicts i.e. with schema: JSONschema-validateable [nested] JSON and/or W3C SHACL-validateable JSON-LD with URIs for migrateable Linked Data.

I don't think there's any risk associated with it. Dicts are generally nested throughout Jupyter and the only spec for metadata is that it's a JSONable dict, which can have aribtrary depth, just like is already true for header, content, etc. which can also have nested fields.

zz-cell-metadata/transmit-cell-metadata.md

willingc · 2021-07-06T16:00:19Z

Hi folks,

This PEP has been open for 3 weeks. We'll be collecting feedback for another week. On Monday July 12, we will request a vote.

Thanks!

Carol

meeseeksmachine · 2021-07-06T17:39:07Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jep-70-transmit-cell-metadata/9877/1

jasongrout

At first glance, I thought this proposal was adding a new field to the message protocol, but a closer reading indicates this understanding was wrong - it is using the existing metadata field all messages have. Is that correct?

From my understanding, thinking as a person that will need to implement this JEP and exactly what would need to be done, what this proposal is proposing is the following. All Jupyter clients that are executing code from the notebook format (i.e., there is a notion of "cell" and "cell metadata" in the client), when they are working with a "cell", must send all cell metadata from the notebook format to the kernel in any execute_request, inspect_request, or complete_request messages. This cell metadata is merged into the top-level metadata dictionary of the above message. Is this a correct understanding? Is there anything else that would need to be implemented if this JEP is passed?

In other words, since the metadata field already exists in these messages, a frontend can already transmit this information if it wishes, so presumably a frontend and kernel can coordinate on this metadata already. Is this JEP really about mandating that the metadata is transmitted from the frontend?

jasongrout · 2021-07-06T22:34:01Z

zz-cell-metadata/transmit-cell-metadata.md

+  "header" : {
+    "msg_id": "...",
+    "msg_type": "...",
+    "metadata": {


In the Reference-level explanation below, the metadata is not in the header. I think it should not be in the header, since the header will be transmitted back and forth many times (since parent_header is a copy of the original message's header). Can this example be changed to conform to the reference below, to move the metadata to the top level of the message rather than in the header?

zz-cell-metadata/transmit-cell-metadata.md

jasongrout · 2021-07-07T00:36:37Z

Another few questions trying to understand the scope of this proposal:

Is there anything that frontends without the notion of a "cell" or "cell metadata" do? For example, the ipython console or JupyterLab Code Console?
What about comm messages? Often ipywidgets sends comm messages that are sort of triggered from a cell - do they need to send cell metadata? What about debug messages?
Right now, as the JEP says:

The attributes within the metadata today are: A) small in size and B) not harmful to send across the wire so keeping the solution simpler was the preferred pattern in the proposal.

Of course, we don't know what custom extensions are putting in cell metadata today, and it seems limiting to make these assumptions about all cell metadata for the future as well. I would strongly suggest a specific place for cell metadata that is meant to be sent to the kernel, rather than always sending everything. For example, we could have a new top-level dictionary for cell metadata to be sent to the kernel. Or perhaps a white-list of fields to send can be in the notebook metadata, so extensions can add metadata and tell the frontend to send it?
One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?

jasongrout · 2021-07-07T00:58:07Z

4. One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?

I agree with Min's thoughts here, that we should separate cell metadata from other types of metadata that is transmitted.

jasongrout · 2021-07-07T01:00:13Z

2. What about comm messages? Often ipywidgets sends comm messages that are sort of triggered from a cell - do they need to send cell metadata? What about debug messages?

Min's thoughts here indicate we should think about this as a generic way of associating cell metadata with any message, so the answers to the above would generally be "yes", i.e., if the message is closely associated with a cell, the relevant metadata should be transmitted.

jasongrout · 2021-07-07T01:07:16Z

In other words, since the metadata field already exists in these messages, a frontend can already transmit this information if it wishes, so presumably a frontend and kernel can coordinate on this metadata already. Is this JEP really about mandating that the metadata is transmitted from the frontend?

I think it would be very clarifying to see exactly what changes are going to be made to the message spec document as part of this JEP, e.g., are there new prescribed fields in the message metadata dict, what their purpose is, etc.

minrk · 2021-07-08T07:05:26Z

@jasongrout thanks for the review!

Yes, I think we should specify that cell metadata SHOULD be transmitted as-is in message.metadata.cell for any message reasonably associated with a cell (rather than directly at top-level message.metadata so there's room for e.g. client metadata or others without collisions).

Co-authored-by: Jason Grout <jasongrout@users.noreply.github.com>

jflam · 2021-07-09T21:47:12Z

Thanks for the thoughtful comments, @jasongrout!

One of the motivation examples includes providing hints about client capabilities. It seems awkward if each cell has repeated metadata about the frontend capabilities. For that usecase, is it better to think about this metadata as "any data the frontend wants to send the kernel" rather than just "cell metadata"?

I agree with Min's thoughts here, that we should separate cell metadata from other types of metadata that is transmitted.

I think you've done a great job at identifying a key tension in this JEP: cell metadata vs. client metadata. Looking at Min's comments from above, what do folks think about having two different fields in the message metadata, i.e., message.metadata.cell and message.metadata.client? This would make the intent clearer at the cost of additional complexity. There are other optimizations possible such as transmitting client metadata only if an actual change was initiated by the client, for example.

Is there anything that frontends without the notion of a "cell" or "cell metadata" do? For example, the ipython console or JupyterLab Code Console?

One of the primary scenarios for transmitting cell metadata is identifying the programming language in an execute_request message. In this case, I wonder if there isn't a list of reserved/standard messages that need to be there so that clients like the ones you describe above will know what message to transmit to the kernel, rather than that being an implicit contract between a front end and the kernel?

Right now, as the JEP says:

The attributes within the metadata today are: A) small in size and B) not harmful to send across the wire so keeping the solution simpler was the preferred pattern in the proposal.

Of course, we don't know what custom extensions are putting in cell metadata today, and it seems limiting to make these assumptions about all cell metadata for the future as well. I would strongly suggest a specific place for cell metadata that is meant to be sent to the kernel, rather than always sending everything. For example, we could have a new top-level dictionary for cell metadata to be sent to the kernel. Or perhaps a white-list of fields to send can be in the notebook metadata, so extensions can add metadata and tell the frontend to send it?

Creating an allow-list in this manner, allowing the notebook to specify it is a different take than what we originally rejected with the kernel being the source of truth. It does have the downside of additional complexity in implementation. Thinking out loud: if there is an explosion in the size of cell metadata down the road, we could mandate that requirement in a future JEP as an optimization that clients would be eager to implement. This would give us a simpler implementation for this JEP and give us time to gather evidence for the more complex implementation. What do you think?

jasongrout · 2021-07-10T06:38:41Z

One of the primary scenarios for transmitting cell metadata is identifying the programming language in an execute_request message.

Here's a thought: instead of transmitting all cell metadata, or even a language name, how about we take a cue from our existing mechanism for identifying rich output formats, and instead transmit a mimetype field that identifies the format of code being transmitted? The mimetype of a typical python execute request could be text/x-python, for example. This naturally opens the door for other kinds of input to even single-language kernels, for example a special mimetype that represents a JSON datastructure for setting values in a language runtime, or other mimetypes that represent a chunk of binary data you are sending to the kernel (like an image or parquet file). Perhaps a sql kernel usually has text sql input, but one of the inputs could be a JSON datastructure containing info about what database to connect to (with a mimetype specific to that format) - that db info is sent in an execute_request with a mimetype of application/vnd.sqlkernel.dbinfo+json.

I realize it's basically the same thing as a language field - a string that identifies to the kernel how to handle the execution/completion request. I usually think of a kernel as having a single "language" (this is implicit in the kernel spec, where the kernel has a "language" identified), but it seems okay to say that a given kernel can receive and act on multiple types of inputs.

jflam · 2021-07-12T20:11:20Z

Yes- using a MIME type to identify a language would make perfect sense! In the prior art section of the JEP, I do mention a few other "meta-kernels" that exist as well that would benefit from this mechanism.

I do like the examples that you give, and it seems like some of those would be metadata that is not likely to be persisted in the .ipynb itself (e.g., sending chunk of binary data) and some that are likely to be persisted, (e.g., setting values in the language runtime - of course depends on what those values represent ...). I wonder if there's a more general concept "runtime metadata?" where some of the metadata comes from cell metadata and others are synthesized by the front end/extensions at runtime?

meeseeksmachine · 2021-09-23T08:44:27Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/propagating-user-environment-information-to-ipython-kernels-and-more-generally/10931/1

meeseeksmachine · 2021-09-23T12:32:18Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/notebook-cell-type-generaliastion/10703/1

jflam added 2 commits June 11, 2021 21:12

importing jep files from the GH issue

aa1865e

updating to transmit metadata in message header

6fa0c93

minrk reviewed Jun 14, 2021

View reviewed changes

jflam added 2 commits June 14, 2021 11:53

updating to transmit metadata in message header

0bd5cd2

more general wording to increase scope of applicability

6d111ef

damianavila reviewed Jun 15, 2021

View reviewed changes

zz-cell-metadata/transmit-cell-metadata.md Show resolved Hide resolved

jasongrout reviewed Jul 7, 2021

View reviewed changes

jasongrout mentioned this pull request Jul 7, 2021

Weekly Dev Meetings: Jan-Jul 2021 jupyterlab/frontends-team-compass#117

Closed

Update zz-cell-metadata/transmit-cell-metadata.md

0e91d84

Co-authored-by: Jason Grout <jasongrout@users.noreply.github.com>

psychemedia mentioned this pull request Sep 21, 2021

Parse question and answer from cell metadata jmshea/jupyterquiz#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Transmit Cell Metadata #70

Proposal: Transmit Cell Metadata #70

jflam commented Jun 12, 2021 •

edited by fcollonval

choldgraf commented Jun 12, 2021

minrk left a comment

minrk Jun 14, 2021

jflam Jun 14, 2021

MSeal Jun 16, 2021

westurner Jun 16, 2021

minrk Jun 17, 2021

willingc commented Jul 6, 2021

meeseeksmachine commented Jul 6, 2021

jasongrout left a comment •

edited

jasongrout Jul 6, 2021

jasongrout commented Jul 7, 2021

jasongrout commented Jul 7, 2021 •

edited

jasongrout commented Jul 7, 2021

jasongrout commented Jul 7, 2021

minrk commented Jul 8, 2021

jflam commented Jul 9, 2021

jasongrout commented Jul 10, 2021 •

edited

jflam commented Jul 12, 2021

meeseeksmachine commented Sep 23, 2021

meeseeksmachine commented Sep 23, 2021

Proposal: Transmit Cell Metadata #70

Are you sure you want to change the base?

Proposal: Transmit Cell Metadata #70

Conversation

jflam commented Jun 12, 2021 • edited by fcollonval

Transmit Cell Metadata

choldgraf commented Jun 12, 2021

minrk left a comment

Choose a reason for hiding this comment

minrk Jun 14, 2021

Choose a reason for hiding this comment

jflam Jun 14, 2021

Choose a reason for hiding this comment

MSeal Jun 16, 2021

Choose a reason for hiding this comment

westurner Jun 16, 2021

Choose a reason for hiding this comment

minrk Jun 17, 2021

Choose a reason for hiding this comment

willingc commented Jul 6, 2021

meeseeksmachine commented Jul 6, 2021

jasongrout left a comment • edited

Choose a reason for hiding this comment

jasongrout Jul 6, 2021

Choose a reason for hiding this comment

jasongrout commented Jul 7, 2021

jasongrout commented Jul 7, 2021 • edited

jasongrout commented Jul 7, 2021

jasongrout commented Jul 7, 2021

minrk commented Jul 8, 2021

jflam commented Jul 9, 2021

jasongrout commented Jul 10, 2021 • edited

jflam commented Jul 12, 2021

meeseeksmachine commented Sep 23, 2021

meeseeksmachine commented Sep 23, 2021

jflam commented Jun 12, 2021 •

edited by fcollonval

jasongrout left a comment •

edited

jasongrout commented Jul 7, 2021 •

edited

jasongrout commented Jul 10, 2021 •

edited