Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-proposal: Kernel messages to get and set variables #115

Open
fcollonval opened this issue Jun 26, 2023 · 12 comments
Open

Pre-proposal: Kernel messages to get and set variables #115

fcollonval opened this issue Jun 26, 2023 · 12 comments

Comments

@fcollonval
Copy link
Contributor

fcollonval commented Jun 26, 2023

Summary

Add two kernel messages to get and set variables fully or partially (e.g. slice of a data frame).

Motivation

We should aim at user experience close to MatLab/Spyder rather than an IDE regarding user / kernel variables interaction; aka high flexibility to introspect and modify variable values on the fly.

Looking for example at the Spyder variable explorer, we can see it is easy to edit simple variables, add new ones (from scratch or by duplication).

Requirements

  • This should be an optional feature.
  • The message contents and replies must be independent of the kernel language.
  • The get message reply should allow to display a variable meaningfully for the user.
  • The get and set messages should support partial introspection and modification of a variable.
  • The set message can be used to create and to update a variable.

[Draft] Implementation details

  • This feature will be flagged as optional building on top of provisional JEP 92.

  • New kernel message get_variables:

    • Request content:
{
  "type": "object",
  "properties": {
    "variables":  {
        "type": "array",
        "uniqueItems": true,
        "items": {
          "type":"object",
          "properties": {
            // Variable unique identifier
            "name": {"type": "string", "minLength": 1}
            // TODO how to specify slicing/pagination
            // ...
            // Variable type
            "mimetype": {"type": "string", "minLength": 1 }
          },
          "required": ["name", "mimetype"]
        }
      }
    },
    "page": {
      "type": "integer",
      "mininum": 1
    },
    "pre_page": {
      "type": "integer",
      "mininum": 1
    }
}

Notes:

  • if no variable are specified, the kernel must return all available variables.

  • pagination is optional

    • Reply content:
{
  "type": "object",
  "properties": {
    // Request status - 'ok' does not mean all set succeeded only that the request was processed
    // if status is 'error', the field `ename`, `evalue` and `traceback` will be present but no `variables`
    // if status is 'abort', nothing else will be provided.
    "status": {
      "const": "ok"
    },
    "variables": {
      "type": "array",
      "items": {
        "oneOf": [
          {
            "type": "object",
            "properties": {
              // Variable unique identifier
              "name": {"type": "string", "minLength": 1}
              // TODO how to specify slicing/pagination
              // ...
              // Status of set per variable
              "status": {
                "const": "ok"
              },
              // Variable type
              "mimetype": {"type": "string", "minLength": 1 }
              // TODO Variable value
              // ... -> JSON, binary, something else (?)
              "value": {},
              "schema": {
                "description": "Text/plain schema that could be used to validate the value."
                "type": "string" 
              }
            },
            "required": ["name", "status", "mimetype", "value"]
          },
          // if status is 'error', the field `ename`, `evalue` and `traceback` will be present
          // if status is 'abort', only the `name` will be provided.
        ]
      }
    },
    "page": {
      "type": "integer",
      "mininum": 1
    },
    "last_page": {
      "type": "integer",
      "mininum": 1
    }
  },
  "required": ["status"]
}
  • There is no execution order
  • New kernel message set_variables:

    • Request content:
{
  "type": "array",
  "uniqueItems": true
  "items": {
    "type":"object",
    "properties": {
      // Variable unique identifier
      "name": {"type": "string", "minLength": 1}
      // TODO how to specify slicing/pagination
      // ...
      // Variable type
      "mimetype": {"type": "string", "minLength": 1 }
      // Variable value
      "value": // TODO ...
      // Value text/plain schema
      "schema": { "type": "string" }
    }
  }
}
  • Reply content:
{
  "type": "object",
  "properties": {
    // Request status - 'ok' does not mean all set succeeded only that the request was processed
    // if status is 'error', the field `ename`, `evalue` and `traceback` will be present but no `variables`
    // if status is 'abort', nothing else will be provided.
    "status": {
      "const": "ok"
    },
    "variables": {
      "type": "array",
      "items": {
        "oneOf": [
          {
            "type": "object",
            "properties": {
              // Variable unique identifier
              "name": {"type": "string", "minLength": 1}
              // TODO how to specify slicing/pagination
              // ...
              // Status of set per variable
              "status": {
                "const": "ok"
              }
            }
          },
          // if status is 'error', the field `ename`, `evalue` and `traceback` will be present
          // if status is 'abort', only the `name` will be provided.
        ]
      }
    }
  }
}
  • There is no execution order
  • Those messages should be executed as quietly as possible (as a silent execute request) and should not populate the history.

Rationale and alternatives

  • We could expand on the debugger adapter protocol. But the debugger feature has the following disadvantages:

    • Implementing DAP for a new kernel is though
    • DAP offers variable introspection but is not meant to create or edit variables
    • Running code with active debugging capabilities reduce code execution performances, what we want here is to be able to get/set variables mainly between cell execution request - no need to execute the code in debug mode.
    • A counterpoint for the previous case is that we could reuse some DAP messages even if the debugger is not active. The trouble is that it will create confusion. And what should a kernel author do if the get/set feature is wanted but not the DAP.
  • We could use silent execute request:

    • This is the path used to get variables by the Classical notebook and JupyterLab extensions.
    • The problem is that the syntax is not languages agnostic

Prior art

Variable explorers are an important features in science exploratory softwares; here are some examples

Extensions for existing Jupyter UI:

Unresolved questions

  • How should this feature interact with the debugger?

    • Should it be turned off or available? - I would vote for the second possibility for consistency
  • How to express variable type and its serialization over the wire?

  • How to specify array and data frame slicing?

  • Should we support slicing for more data types?

Future possibilities

A nice side effect of these new messages will be the ability to introduce no-code/low-code cells like Colab forms, SQL cells (the result of the cell would be a set message to the kernel),... . It could also help in developing polyglot kernels.


Edited:

  • add optional value text/plain schema
  • rename type in mimetype
@fcollonval
Copy link
Contributor Author

fcollonval commented Jun 26, 2023

Friendly ping to some schemas experts 😉 @bollwyvl @tonyfast @agoose77 @JohanMabille

cc @afshin (as this originates from our discussions).

@bollwyvl
Copy link

I'll confess I haven't fully dug into the details of the proposed schema, but do think there is a fair amount of benefit in considering reuse of an existing wire format messages, such as those provided by LSP.

While LSP and DAP are document-based instead of namespace-based, there's already a great deal of work (and value) represented in them, and duplicating these features (except "Jupyter-flavored") seems to be heading back to the bikeshed, as there's already setVariable and friends. Perhaps the heavy "i am debugging" modal switch can be relaxed in some way so that these messages would work in a more general, albeit reduced, way.

Tying these together with LSP: this would likely include symbol-related structures, for which we already often need to declare a new temprary document (e.g. jupyter://__main__) to anchor dynamic symbols... in a different way than DAP.

As to how these actually get called: as an alternative to novel named messages, one approach might be for an active kernel to advertise a jupyter comm target for, e.g. application/lsp+json and/or application/dap+json, at some spec version, and then use the existing negotiation techniques (e.g. serverCapabilities and clientCapabilities) to coordinate any subset of the 140+ messages LSP defines, and whatever DAP does. Comm doesn't seem entirely out-of-line, as "a variable inspector" sounds like something that can be prototyped quickly with client-agnostic jupyter widgets on existing comm.

This is different than the approach demonstrated in the horribly-bitrotten LSP proxy kernel PR, which uses comm, but mostly benefited from moving multiple language servers' connections to a single, well-understood websocket (which is itself an interactive kernel, which is fun), but shows the general pattern.

@JohanMabille
Copy link
Member

+1 on the ability to inspect and modify the variables, this is definitely a mandatory feature. Here are some thoughts on this:

How should this feature interact with the debugger?

I think the variable explorer and the variable inspector form the debugger (when the debugger is supported by the kernel) should be the same visual entity to avoid confusion. However, that means that the variable explorer / variable inspector would show the local variables only when the debugger is active and the execution has stopped on a breakpoint.

DAP offers variable introspection but is not meant to create or edit variables

Actually the DAP offers the setVariable request, which allows to edit or create variables, but only in a suspendend state (i.e. breakpoint or pause). However we could extend this mechanism to suppor the feature in all the cases, just like we did with the inspectVariable / richInspectVariable, which extend the variables request

I agree with @bollwyvl that we should leverage on already existing messages. The DAP seems more adapted than the LSP to me (although I'm biased since I'm more educated to the DAP than the LSP), but it has some constraint: DAP messages must be sent over the control channel in order to work when the code is suspended (this is also true for "extended DAP" messages such as inpectVariables).

Also the kernel already advertises the features it supports through the kernel_info_request/kernel_info_reply messages, I don't think we need to duplicate this mechanism with an additional jupyter comm target.

A mean to avoid specification (and code) duplication could be to:

  • make inspectVariables / editVariable (or get/set variables) an optional feature independent from the DAP (and flag the current specification of inspectVariable and richInspectVariable as deprecated)
  • have additional attributes in these messages when the debugger feature is supported so that the messages can be forwarded as DAP messages to the debugger when the interpreter execution is suspended.

@fcollonval
Copy link
Contributor Author

Thanks both for your comments.

One point I still don't get how to achieve is the ability to handle partial get/set. I see that the DAP request as something to do pagination. But it looks like, it is pagination for the children of a container (like attributes of a class). But for Jupyter we need to go a step further and support array/dataframe pagination. And I don't see how the current DAP request address this exactly.

According to the DAP, a Variable can have the following attributes to support pagination:

  /**
   * The number of named child variables.
   * The client can use this information to present the children in a paged UI
   * and fetch them in chunks.
   */
  namedVariables?: number;

  /**
   * The number of indexed child variables.
   * The client can use this information to present the children in a paged UI
   * and fetch them in chunks.
   */
  indexedVariables?: number;

Does that means we should paginate nd-array and dataframe as flatten array?

And for setVariables it is not possible to only do a partial set; the specification is:

interface SetVariableArguments {
  /**
   * The reference of the variable container. The `variablesReference` must have
   * been obtained in the current suspended state. See 'Lifetime of Object
   * References' in the Overview section for details.
   */
  variablesReference: number;

  /**
   * The name of the variable in the container.
   */
  name: string;

  /**
   * The value of the variable.
   */
  value: string;

  /**
   * Specifies details on how to format the response value.
   */
  format?: ValueFormat;
}

What would be a good solution? extend that request?

@JohanMabille
Copy link
Member

Actually, each element of an array is a variable, so it is possible to directly use pagination for arrays; for partial assignment of arrays, the new setVariables / editVariables request could support additional attributes; when receiving such a request, the kernel could simply dispatch setVariable requests internally (but that would be an implementation detail). Or the editVariables request can simply be a list of simple setVariable requests (to avoid sending many messages but keep the request as simple as possible).

Regarding dataframes / nd arrays, the variable request would indeed expose their internal structures. I don't have a good answer for this, maybe an additional request for getting a way to paginate a variable depending on its type, but that would be really more involved. Or we could do something similar to what VIsual Studio does, let the possibility to write custom visualizers as Jupyter extensions (and Jupyter Lab would provide default fallbacks only for nd arrays / dataframes).

@davidbrochart
Copy link
Member

I understand the value in building on top of already-existing protocols, but if kernels have to implement the DAP or LSP protocols in order to support this feature, I'm afraid we are going to leave a lot of kernels behind.
I would rather prefer to use the Python data model, which I believe is general and simple enough to be applied to all kernels:

v[3]  # single element
v[1:5]  # range of elements
v[7:1:-2]  # step
v[1:7, 4:8]  # two dimensions

We could even support attribute access, either through the dot (.) or through the attribute syntax ([]), like Jinja does.

v.path.to.attribute[3]
v["path"]["to"]["attribute"][3]

@JohanMabille
Copy link
Member

I understand the value in building on top of already-existing protocols, but if kernels have to implement the DAP or LSP protocols in order to support this feature, I'm afraid we are going to leave a lot of kernels behind.

Kernels would not have to implement the full DAP to support the get/set variable feature:

  • if a kernel wants to support the optional egt/set variables, it only has a few messages to implement, not directly related to the DAP (since they are current extensions to the DAP); the only part from the DAP we would reuse is the Variable type.
  • if a kernel wants to support the debugger feature, it has to support additional fields in the messages, that can be forwarded as DAP messages.

I would rather prefer to use the Python data model, which I believe is general and simple enough to be applied to all kernels:

This works for simple variable types such as integers or string, where the value is easy to write in a message. What if you want to assign an array of complex objects? The current specification in the DAP with nested variable references is very flexible, although a bit more complicated to implement than the syntax you describe.

Also reusing the Variable type means that we would have the same mechanism for the following use cases:

  • you want to get / set a variable in a kernel that does not support the debugger feature
  • you want to get / set a variable in a kernel that does support the debugger feature, when the execution has not stopped on a breakpoint
  • you want to get / set a variable in a kernel that does support the debugger feature, when the execution has stopped on a breakpoint.

Otherwise, we have two mechanisms to implement (since the DAP already enforces its own).

@davidbrochart
Copy link
Member

This works for simple variable types such as integers or string, where the value is easy to write in a message. What if you want to assign an array of complex objects?

Correct me if I'm wrong, but we will always get/set variables as representations, right? Typically, we will get Python's __repr__ of an object, and set a variable from the UI, thus also a string representation. This can work for simple types such as integers and strings, but also lists, dictionaries... Am I missing something or do you envision more complex objects?

@JohanMabille
Copy link
Member

JohanMabille commented Jun 29, 2023

Correct me if I'm wrong, but we will always get/set variables as representations, right?

Not only, for a structured variable, you also get a variable reference that you can use in a new get request to get the nested variables; also the representation of a structured variable is usually quite short.

For the set request, you can evaluate a complex value expression and assign it to a lvalue expression; for instance, a SetExpressionArguments used in a SetExpressionRequest could be:

{
  "expression": "v[i]",
  "value": "new ComplicatedObject(a, 3)"
}

Therefore the values of the fields are language dependent, while the specification of the fields is language agnostic.

@fcollonval
Copy link
Contributor Author

For the set request, you can evaluate a complex value expression and assign it to a lvalue expression; for instance, a
SetExpressionArguments used in a SetExpressionRequest could be:

This is exactly what I want to avoid with this proposal. get and set should be language agnostic so that client don't need to know the language.

If the value is to be expressed in the kernel language, I don't see a value for a new JEP as basically we can do this with silence request to the kernel as done in extension such as variable inspector.

We could do something similar to the kernel reply using some mimetypes and a data buffer. If the specification restrain the mimetype to something like a application/json and some binary format for dataframe/array like arrow, we will bring very useful power to the protocol.

@fcollonval
Copy link
Contributor Author

fcollonval commented Jul 19, 2023

The goal would be to transfer serialized version of a variable (not necessary string although text/plain could be the fallback).

However that approach does not seem to fit well with the existing VariableType approach of the DAP. But it does seem to be a natural extension of the richInspectVariable (that is more jovyan 😉 )

@minrk
Copy link
Member

minrk commented Aug 1, 2023

I feel like this:

The get message reply should allow to display a variable meaningfully for the user.

suggests to me that the get reply should be (or include) a mimebundle representation, so it's consistent with the rest of the ways objects are represented in Jupyter, and Jupyter frontends can display variables with the tools already available to them. This would allow extensions like variable inspector to move to a more official/stable API without needing to reimplement representations for DAP Variable outputs.

We could add a specific mime-type (application/json+dap-variable or something) and recommend it's populated (or require it, but that seems unnecessary to me).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants