Spark master is set to `local[*]` by default. Here is corresponding piece
form interpreter-settings.json for Spark interpreter:
"master": {
"envName": "MASTER",
"propertyName": "spark.master",
"defaultValue": "local[*]",
"description": "Spark master uri. local | yarn-client | yarn-cluster |
spark master address of standalone mode, ex) spark://master_host:7077",
"type": "string"
},
Patrik Iselind at "Sun, 10 May 2020 20:31:08 +0200" wrote:
PI> Hi Jeff,
PI> I've tried the release from http://zeppelin.apache.org/download.html, both
in a docker and without a docker. They both have the same issue as
PI> previously described.
PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps using
some environment variable?
PI> When is the next Zeppelin 0.9.0 docker image planned to be released?
PI> Best Regards,
PI> Patrik Iselind
PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <[email protected]> wrote:
PI> Hi Patric,
PI>
PI> Do you mind to try the 0.9.0-preview, it might be an issue of docker
container.
PI>
PI> http://zeppelin.apache.org/download.html
PI> Patrik Iselind <[email protected]> 于2020年5月10日周日上午2:30写道:
PI>
PI> Hello Jeff,
PI>
PI> Thank you for looking into this for me.
PI>
PI> Using the latest pushed docker image for 0.9.0 (image ID
92890adfadfb, built 6 weeks ago), I still see the same issue. My image has
PI> the digest
"apache/zeppelin@sha256:0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
PI>
PI> If it's not on the tip of master, could you guys please release a
newer 0.9.0 image?
PI>
PI> Best Regards,
PI> Patrik Iselind
PI> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <[email protected]> wrote:
PI>
PI> This might be a bug of 0.8, I tried that in 0.9 (master
branch), it works for me.
PI>
PI> print(sc.master)
PI> print(sc.defaultParallelism)
PI>
PI> ---
PI> local[*] 8
PI> Patrik Iselind <[email protected]> 于2020年5月9日周六下午8:34写道:
PI>
PI> Hi,
PI>
PI> First comes some background, then I have some questions.
PI>
PI> Background
PI> I'm trying out Zeppelin 0.8.2 based on the Docker image.
My Docker file looks like this:
PI>
PI> ```Dockerfile
PI> FROM apache/zeppelin:0.8.2
PI>
PI> # Install Java and some tools
PI> RUN apt-get -y update &&\
PI> DEBIAN_FRONTEND=noninteractive \
PI> apt -y install vim python3-pip
PI>
PI> RUN python3 -m pip install -U pyspark
PI>
PI> ENV PYSPARK_PYTHON python3
PI> ENV PYSPARK_DRIVER_PYTHON python3
PI> ```
PI>
PI> When I start a section like so
PI>
PI> ```Zeppelin paragraph
PI> %pyspark
PI>
PI> print(sc)
PI> print()
PI> print(dir(sc))
PI> print()
PI> print(sc.master)
PI> print()
PI> print(sc.defaultParallelism)
PI> ```
PI>
PI> I get the following output
PI>
PI> ```output
PI> <SparkContext master=local appName=Zeppelin>
['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
PI> '__doc__', '__enter__', '__eq__', '__exit__',
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
PI> '__hash__', '__init__', '__le__', '__lt__', '__module__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
PI> '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', '__weakref__', '_accumulatorServer',
'_active_spark_context',
PI> '_batchSize', '_callsite', '_checkpointFile', '_conf',
'_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
PI> '_getJavaStorageLevel', '_initialize_context',
'_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
PI> '_pickled_broadcast_vars', '_python_includes',
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', 'addFile',
PI> 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
PI> 'defaultMinPartitions', 'defaultParallelism',
'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
PI> 'profiler_collector', 'pythonExec', 'pythonVer', 'range',
'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel',
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
'version', 'wholeTextFiles'] local 1
PI> ```
PI>
PI> This even though the "master" property in the interpretter
is set to "local[*]". I'd like to use all cores on my machine. To
PI> do that I have to explicitly create the "spark.master"
property in the spark interpretter with the value "local[*]", then I
PI> get
PI>
PI> ```new output
PI> <SparkContext master=local[*] appName=Zeppelin>
['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
PI> '__doc__', '__enter__', '__eq__', '__exit__',
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
PI> '__hash__', '__init__', '__le__', '__lt__', '__module__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
PI> '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', '__weakref__', '_accumulatorServer',
'_active_spark_context',
PI> '_batchSize', '_callsite', '_checkpointFile', '_conf',
'_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
PI> '_getJavaStorageLevel', '_initialize_context',
'_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
PI> '_pickled_broadcast_vars', '_python_includes',
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', 'addFile',
PI> 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
PI> 'defaultMinPartitions', 'defaultParallelism',
'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
PI> 'profiler_collector', 'pythonExec', 'pythonVer', 'range',
'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel',
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
'version', 'wholeTextFiles'] local[*] 8
PI> ```
PI> This is what I want.
PI>
PI> The Questions
PI> @ Why is the "master" property not used in the created
SparkContext?
PI> @ How do I add the spark.master property to the docker
image?
PI>
PI> Any hint or support you can provide would be greatly
appreciated.
PI>
PI> Yours Sincerely,
PI> Patrik Iselind
PI> --
PI> Best Regards
PI>
PI> Jeff Zhang
PI> --
PI> Best Regards
PI>
PI> Jeff Zhang
--
With best wishes, Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)