Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use variables in %%configure #834

Open
sunayansaikia opened this issue Sep 10, 2023 · 4 comments
Open

Use variables in %%configure #834

sunayansaikia opened this issue Sep 10, 2023 · 4 comments
Labels
kind:enhancement A new Sparkmagic feature

Comments

@sunayansaikia
Copy link

sunayansaikia commented Sep 10, 2023

Is your feature request related to a problem? Please describe.

Currently, in the Pyspark kernel wrapper, we can define %%configure as described below in a notebook cell -

%%configure
{
 "executorCores": 2,
 "conf": {
     "spark.sql.some.key": "a_predefined_fixed_value"
  }
}

However, it seems, it does not allow me to pass a value dynamically at runtime; e.g. in the above configuration, I'd like to be able to have the 'executorCores' and 'spark.sql.some.key' set dynamically; e.g.: may be to something that a function would return.

Describe the solution you'd like
Would it be possible to enable somethig like below (example) or may be something better?

import json
import os

executor_cores = # assume the value will be derived from a function
custom_value = # assume derived from os.environ["CUSTOM_VALUE"]

spark_conf = {
 "executorCores": int(executor_cores),
 "conf": {
     "spark.sql.some.key": str(custom_value)
  }
}
spark_conf_json= json.dumps(spark_conf)

%%configure
"$spark_conf_json"

Describe alternatives you've considered
No alternative solution found

Additional context
No additional context

@dosubot dosubot bot added the kind:enhancement A new Sparkmagic feature label Sep 10, 2023
@dosubot
Copy link

dosubot bot commented Sep 10, 2023

🤖

Thank you for your feature request. I understand that you want to enhance the %%configure functionality in sparkmagic to allow dynamic assignment of values, such as 'executorCores' and 'spark.sql.some.key', which could be derived from a function or environment variable at runtime.

Currently, the %%configure magic function takes a JSON string as input, which is then parsed into a dictionary. This dictionary is used to override the session settings. To allow dynamic assignment of values, we could modify the configure method to evaluate Python expressions in the input JSON string before parsing it. This can be done using the eval function.

Here is an example of how you could modify the configure method:

@magic_arguments()
@cell_magic
@wrap_unexpected_exceptions
@handle_expected_exceptions
@_event
def configure(self, line, cell="", local_ns=None):
    try:
        # Evaluate Python expressions in the input string
        cell = eval(f'f"""{cell}"""')
        dictionary = json.loads(cell)
    except ValueError:
        self.ipython_display.send_error(
            "Could not parse JSON object from input '{}'".format(cell)
        )
        return
    args = parse_argstring_or_throw(self.configure, line)
    if self.session_started:
        if not args.force:
            self.ipython_display.send_error(
                "A session has already been started. If you intend to recreate the "
                "session with new configurations, please include the -f argument."
            )
            return
        else:
            self._do_not_call_delete_session("")
            self._override_session_settings(dictionary)
            self._do_not_call_start_session("")
    else:
        self._override_session_settings(dictionary)
    self.info("")

With this modification, you can use Python expressions in the %%configure magic function, like this:

%%configure
{
    "executorCores": os.environ['EXECUTOR_CORES'],
    "spark.sql.some.key": get_some_key()
}

In this example, os.environ['EXECUTOR_CORES'] is an environment variable and get_some_key() is a function that returns a value. These expressions will be evaluated at runtime and their results will be used as the values for 'executorCores' and 'spark.sql.some.key', respectively.

Please note that this modification could potentially introduce security risks, as it allows execution of arbitrary Python code. Make sure to only use it in a trusted environment.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@sunayansaikia
Copy link
Author

@dosu-bot

I think we also need to enhance %%configure to support taking variables defined in the %%local to make this possible?
Otherwise, the Pyspark Kernel wrapper would just start initializing the spark context. Or, is there any other way to do imports and define variables before %%configure?

@dosubot
Copy link

dosubot bot commented Sep 10, 2023

🤖

Hello,

Thank you for your detailed feature request. I understand that you want to enhance the %%configure functionality to allow dynamic assignment of values, such as 'executorCores' and 'spark.sql.some.key', which could be derived from a function or environment variable at runtime.

To achieve this, you can modify the configure method in the KernelMagics class. Currently, the configure method reads the cell content as a JSON string and expects it to be a dictionary of configuration parameters. To allow dynamic assignment of values, you can parse the cell content as Python code instead of a JSON string. This way, you can use Python expressions, functions, or environment variables to compute the values of the configuration parameters.

Here is a modified version of the configure method that implements this idea:

@magic_arguments()
@cell_magic
@argument(
    "-f",
    "--force",
    type=bool,
    default=False,
    nargs="?",
    const=True,
    help="If present, user understands.",
)
@wrap_unexpected_exceptions
@handle_expected_exceptions
@_event
def configure(self, line, cell="", local_ns=None):
    try:
        # Evaluate the cell content as Python code
        dictionary = eval(cell, local_ns)
    except Exception as e:
        self.ipython_display.send_error(
            "Could not evaluate cell content as Python code: {}".format(e)
        )
        return
    args = parse_argstring_or_throw(self.configure, line)
    if self.session_started:
        if not args.force:
            self.ipython_display.send_error(
                "A session has already been started. If you intend to recreate the "
                "session with new configurations, please include the -f argument."
            )
            return
        else:
            self._do_not_call_delete_session("")
            self._override_session_settings(dictionary)
            self._do_not_call_start_session("")
    else:
        self._override_session_settings(dictionary)
    self.info("")

With this modification, you can use Python expressions in the %%configure cell. For example:

%%configure
{
    "executorCores": get_executor_cores(),
    "spark.sql.some.key": os.environ["SOME_ENV_VAR"]
}

In this example, get_executor_cores() is a function that computes the number of executor cores, and os.environ["SOME_ENV_VAR"] is an environment variable. These expressions are evaluated when the %%configure cell is executed, and their values are used as the configuration parameters.

Please note that this modification could potentially introduce security risks, as it allows execution of arbitrary Python code. Make sure to use it in a trusted environment.

I hope this helps! If you have any further questions, feel free to ask.

Best,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@devstein
Copy link
Collaborator

@sunayansaikia %%configure customization is a common feature request. I like the angle of using a function to programmatically fetch variables. Dosu's approach seems reasonable at a glance as long as we maintain backwards compatibility.

FWIW, one-workaround is to write a function to modify the global sparkmagic config.json before starting the kernel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:enhancement A new Sparkmagic feature
Projects
None yet
Development

No branches or pull requests

2 participants