Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Default Docker container got broken #809

Open
ltregan opened this issue Mar 13, 2023 · 4 comments
Open

[BUG] Default Docker container got broken #809

ltregan opened this issue Mar 13, 2023 · 4 comments

Comments

@ltregan
Copy link

ltregan commented Mar 13, 2023

Describe the bug
I believe there was a new push of the image by datamechanics (5 days ago ?) and now sparkmagic docker image does not work anymore. If you log to the spark-1 container, and try ../bin/pyspark I get this error:

		Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0
    at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
    at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
    at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
    at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)

To Reproduce
git clone https://github.com/jupyter-incubator/sparkmagic sparkmagic-dev
cd sparkmagic-dev
docker compose up

then create a new PySpark notebook and a simple command does not. work. eg. %data = [(1, 'John', 'Doe')]

The code failed because of a fatal error:
	Error sending http request and maximum retry encountered..

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.

Expected behavior
PySpark kernel should work

Screenshots
If applicable, add screenshots to help explain your problem.

Versions:

  • SparkMagic 20.4 or master
  • Livy (if you know it)
  • Spark 2.4.7

Additional context
I believe there was a new push of the image by datamechanics (5 days ago ?)

@ltregan ltregan changed the title [BUG] Not working with datamechanics/spark:2.4.7-hadoop-3.1.0-java-8-scala-2.11-python-3.7-latest [BUG] Default Docker container got broken Mar 13, 2023
@devstein
Copy link
Collaborator

devstein commented Mar 15, 2023

Hi @ltregan thanks for opening an issue. I'm looking into this today

@devstein
Copy link
Collaborator

@ltregan I'm unable to reproduce. Can you check if you are still running into issues?

@ltregan
Copy link
Author

ltregan commented Mar 15, 2023

Still same issue, even after clearing the cache. Exact sequence is:

$ docker system prune -a -f
$ git clone https://github.com/jupyter-incubator/sparkmagic sparkmagic-dev
$ cd sparkmagic-dev
$ docker compose up

I am on Mac M1. Something fishy also is that CPU start at 20% (can be seen in the screenshots at the bottom) then goes up to 40% after a couple of minutes and stay there.

Full log then screenshots below.


sh-5.1# ../bin/pyspark
Python 3.7.11 (default, Jul 27 2021, 14:32:16) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
23/03/15 18:45:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "Thread-4" java.lang.ExceptionInInitializerError
        at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
        at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
        at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
        at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
        at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0
        at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
        at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
        at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
        at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
        ... 10 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
    response = connection.send_command(command)
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.7
      /_/

Using Python version 3.7.11 (default, Jul 27 2021 14:32:16)
SparkSession available as 'spark'.
>>> 

Screenshot 2023-03-15 at 7 45 27 PM

Screenshot 2023-03-15 at 7 45 36 PM

Screenshot 2023-03-15 at 7 45 45 PM

@devstein
Copy link
Collaborator

devstein commented Apr 5, 2023

@ltregan Thanks for the screenshots. I'm able to reproduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants