Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does sparkmagic support dual scala/python spark session? #890

Open
shay1bz opened this issue Feb 28, 2024 · 3 comments
Open

Does sparkmagic support dual scala/python spark session? #890

shay1bz opened this issue Feb 28, 2024 · 3 comments
Labels
kind:enhancement A new Sparkmagic feature

Comments

@shay1bz
Copy link

shay1bz commented Feb 28, 2024

As Livy supports scala/python/sql statements in the same session, is it possible to extend sparkmagic to support it also?

Copy link

dosubot bot commented Feb 28, 2024

🤖

Hey there, @shay1bz! Great to see you back in the sparkmagic universe. How's everything in your coding cosmos? 🚀

Yes, it is possible to extend sparkmagic to support dual Scala/Python Spark sessions, given that Livy supports Scala, Python, and SQL statements in the same session. The current implementation of sparkmagic already includes mechanisms to handle different programming languages for Spark sessions. This is indicated by the LANG_PYTHON, LANG_SCALA, and LANG_R constants and the -l or --language argument in the @magic_arguments() decorator for the spark magic command. This argument allows users to specify the language for the Livy session, which can be one of Python, Scala, or R.

To support dual Scala/Python Spark sessions more explicitly, you could enhance the spark magic command to allow specifying multiple languages within a single session or to manage multiple sessions simultaneously, one for each language. However, the current implementation already supports creating separate sessions for Scala and Python through the Livy REST API by specifying the language at session creation time. Users can switch between these sessions using the %spark magic command with the appropriate session name.

Here is the critical part of the code that handles session creation with a specified language:

@magic_arguments()
@argument(
    "-l",
    "--language",
    type=str,
    default=None,
    help="Language for Livy session; one of {}".format(
        ", ".join([LANG_PYTHON, LANG_SCALA, LANG_R])
    ),
)

And the session creation logic in the elif subcommand == "add": block, which uses the language argument to determine the session's language:

language = args.language
properties = conf.get_session_properties(language)
self.spark_controller.add_session(name, endpoint, skip, properties)

This setup allows for the creation of separate sessions for Scala and Python, leveraging Livy's capability to handle different languages. To work with both Scala and Python code in the same notebook, users can create two sessions (one for Scala and one for Python) and switch between them as needed using the %spark magic command with the appropriate session name.

Regarding session switching, sparkmagic handles this by allowing users to specify the session they want to use when executing a command with the %spark magic command. The mechanism to switch between different language sessions within the same notebook involves specifying the session name and the language for the session when adding a session or running commands.

When adding a session, users can specify the language of the session using the -l or --language argument, which accepts one of the predefined language constants (LANG_PYTHON, LANG_SCALA, LANG_R). This is done with the add subcommand, as shown in the following example:

%spark add -s my_session -l python -u https://sparkcluster.net/livy -t Kerberos -a my_user -p my_password

When executing Spark code or SparkSQL code, users can switch between sessions by specifying the session name with the -s or --session argument in the %spark magic command. This allows users to execute code in different language sessions within the same notebook. For example, to run code in a session named my_session, the user would use:

%%spark -s my_session

As for the limit to the number of sessions that can be created and managed by sparkmagic, there is no explicit limit set within the SessionManager class for the number of sessions that can be created and managed. The class uses a Python dictionary (self._sessions) to store sessions, and the limitations would primarily be based on the system's available memory and the Python interpreter's limitations for dictionary sizes, which are generally quite large and not a practical concern for typical usage scenarios.

The enforcement of session uniqueness is handled by checking if a session name already exists in the _sessions dictionary before adding a new session. If a session with the given name exists, a SessionManagementException is raised, preventing duplicate sessions from being created. However, this mechanism does not enforce a maximum number of sessions; it only ensures session name uniqueness.

def add_session(self, name, session):
    if name in self._sessions:
        raise SessionManagementException(
            "Session with name '{}' already exists. Please delete the session"
            " first if you intend to replace it.".format(name)
        )

    self._sessions[name] = session

Therefore, the limit to the number of sessions that can be created and managed by sparkmagic through the SessionManager class is not explicitly defined in the code and is subject to the limitations of the Python dictionary and the available system resources.

I hope this information helps! If you have any other questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the kind:enhancement A new Sparkmagic feature label Feb 28, 2024
@shay1bz
Copy link
Author

shay1bz commented Feb 28, 2024

@dosu-bot please elaborate on single session, multiple languages. Users can choose default language, for example Scala, and also execute pyspark cells with %%pyspark/%%python magic in the same session.

Copy link

dosubot bot commented Feb 28, 2024

🤖

To execute PySpark cells with %%pyspark/%%python magic in the same session when Scala is chosen as the default language in sparkmagic, you can follow these steps:

  1. Ensure that sparkmagic is loaded in your IPython notebook by running:
%load_ext sparkmagic.magics
  1. Use the %manage_spark line magic to add a Livy endpoint and create a session for PySpark, even if you have already created a session for Scala. When creating the session, you can specify the session configuration to use PySpark.

  2. After creating the PySpark session, you can run PySpark code by using the %%spark magic at the beginning of the cell. If you have multiple sessions (e.g., one for Scala and one for PySpark), you need to specify the session name using the -s option to indicate which session the code should run against.

For example, if you have created a PySpark session named my_pyspark_session, you can execute PySpark code as follows:

%%spark -s my_pyspark_session
numbers = sc.parallelize([1, 2, 3, 4])
print('First element of numbers is {} and its description is:\n{}'.format(numbers.first(), numbers.toDebugString()))

This approach allows you to execute PySpark code in the same IPython notebook session, even if Scala is chosen as the default language for sparkmagic sessions. You can refer to the examples/Magics in IPython Kernel.ipynb for more examples and details.

Regarding how sparkmagic handles multiple sessions, including sessions with different languages like Scala and Python, it allows the user to configure the session parameters and language through magics. The language for a session is set by calling a specific magic command with the language option. For instance, the %%configure magic allows users to set session creation parameters, including the language, by passing a JSON configuration. The language setting is part of the session configuration and is managed by the KernelMagics class, which interacts with the Livy server to start, manage, and execute code in these sessions.

The KernelMagics class contains methods like spark, sql, and configure, which are decorated with @cell_magic and other decorators to handle the execution of Spark code, SQL queries, and session configuration, respectively. The language for the session can be specified in the configuration passed to the %%configure magic. This configuration is then used to start a new session or reconfigure an existing one with the specified language.

The KernelMagics class does not explicitly handle multiple sessions simultaneously within the same notebook cell. Instead, it provides mechanisms to configure, start, and manage sessions, including specifying the language. Sessions are managed on the Livy server, and sparkmagic communicates with Livy to execute commands in the context of these sessions. Users can switch between sessions (including those with different languages) by configuring the session parameters accordingly.

In summary, sparkmagic handles multiple sessions, including those with different languages, by allowing users to configure session parameters through magic commands. The actual management of sessions, including language settings, is handled by the Livy server, with sparkmagic serving as the interface for communication and execution of commands within these sessions. You can refer to the sparkmagic/sparkmagic/kernels/kernelmagics.py for more details.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:enhancement A new Sparkmagic feature
Projects
None yet
Development

No branches or pull requests

1 participant