Thank you for clarification Patrik
Can you create JIRA for tracking & fixing of this?
thanks
Patrik Iselind at "Sat, 16 May 2020 15:45:07 +0200" wrote:
PI> Hi Alex,
PI> Thanks a lot for helping out with this.
PI> You're correct, but it doesn't seem that it's the
interpreter-settings.json for Spark interpreter that is being used. It's conf/
PI> interpreter.json. In this file both 0.8.2 and 0.9.0 have
PI> ```partial-json
PI> "spark": {
PI> "id": "spark",
PI> "name": "spark",
PI> "group": "spark",
PI> "properties": {
PI> "SPARK_HOME": {
PI> "name": "SPARK_HOME",
PI> "value": "",
PI> "type": "string",
PI> "description": "Location of spark distribution"
PI> },
PI> "master": {
PI> "name": "master",
PI> "value": "local[*]",
PI> "type": "string",
PI> "description": "Spark master uri. local | yarn-client |
yarn-cluster | spark master address of standalone mode, ex) spark://
PI> master_host:7077"
PI> },
PI> ```
PI> That "master" should be "spark.master".
PI> By adding an explicit spark.master with the value "local[*]" I can use all
cores as expected. Without this and printing sc.master I get
PI> "local". With the addition of the spark.master property set to "local[*]"
and printing sc.master I get "local[*]". My conclusion is that conf/
PI> interpreter.json isn't in sync with the interpreter-settings.json for
Spark interpreter.
PI> Best regards,
PI> Patrik Iselind
PI> On Sat, May 16, 2020 at 11:22 AM Alex Ott <[email protected]> wrote:
PI> Spark master is set to `local[*]` by default. Here is corresponding
piece
PI> form interpreter-settings.json for Spark interpreter:
PI>
PI> "master": {
PI> "envName": "MASTER",
PI> "propertyName": "spark.master",
PI> "defaultValue": "local[*]",
PI> "description": "Spark master uri. local | yarn-client |
yarn-cluster | spark master address of standalone mode, ex) spark://
PI> master_host:7077",
PI> "type": "string"
PI> },
PI> Patrik Iselind at "Sun, 10 May 2020 20:31:08 +0200" wrote:
PI> PI> Hi Jeff,
PI>
PI> PI> I've tried the release from
http://zeppelin.apache.org/download.html, both in a docker and without a
docker. They both have the same
PI> issue as
PI> PI> previously described.
PI>
PI> PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps
using some environment variable?
PI>
PI> PI> When is the next Zeppelin 0.9.0 docker image planned to be
released?
PI>
PI> PI> Best Regards,
PI> PI> Patrik Iselind
PI>
PI> PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <[email protected]>
wrote:
PI>
PI> PI> Hi Patric,
PI> PI>
PI> PI> Do you mind to try the 0.9.0-preview, it might be an issue of
docker container.
PI> PI>
PI> PI> http://zeppelin.apache.org/download.html
PI>
PI> PI> Patrik Iselind <[email protected]> 于2020年5月10日周日上午2:30写道:
PI> PI>
PI> PI> Hello Jeff,
PI> PI>
PI> PI> Thank you for looking into this for me.
PI> PI>
PI> PI> Using the latest pushed docker image for 0.9.0 (image ID
92890adfadfb, built 6 weeks ago), I still see the same issue. My
PI> image has
PI> PI> the digest
"apache/zeppelin@sha256:0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
PI> PI>
PI> PI> If it's not on the tip of master, could you guys please
release a newer 0.9.0 image?
PI> PI>
PI> PI> Best Regards,
PI> PI> Patrik Iselind
PI>
PI> PI> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang
<[email protected]> wrote:
PI> PI>
PI> PI> This might be a bug of 0.8, I tried that in 0.9
(master branch), it works for me.
PI> PI>
PI> PI> print(sc.master)
PI> PI> print(sc.defaultParallelism)
PI> PI>
PI> PI> ---
PI> PI> local[*] 8
PI>
PI> PI> Patrik Iselind <[email protected]>
于2020年5月9日周六下午8:34写道:
PI> PI>
PI> PI> Hi,
PI> PI>
PI> PI> First comes some background, then I have some
questions.
PI> PI>
PI> PI> Background
PI> PI> I'm trying out Zeppelin 0.8.2 based on the Docker
image. My Docker file looks like this:
PI> PI>
PI> PI> ```Dockerfile
PI> PI> FROM apache/zeppelin:0.8.2
PI>
PI> PI>
PI> PI> # Install Java and some tools
PI> PI> RUN apt-get -y update &&\
PI> PI> DEBIAN_FRONTEND=noninteractive \
PI> PI> apt -y install vim python3-pip
PI> PI>
PI> PI> RUN python3 -m pip install -U pyspark
PI> PI>
PI> PI> ENV PYSPARK_PYTHON python3
PI> PI> ENV PYSPARK_DRIVER_PYTHON python3
PI> PI> ```
PI> PI>
PI> PI> When I start a section like so
PI> PI>
PI> PI> ```Zeppelin paragraph
PI> PI> %pyspark
PI> PI>
PI> PI> print(sc)
PI> PI> print()
PI> PI> print(dir(sc))
PI> PI> print()
PI> PI> print(sc.master)
PI> PI> print()
PI> PI> print(sc.defaultParallelism)
PI> PI> ```
PI> PI>
PI> PI> I get the following output
PI> PI>
PI> PI> ```output
PI> PI> <SparkContext master=local appName=Zeppelin>
['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__',
PI> '__dir__',
PI> PI> '__doc__', '__enter__', '__eq__', '__exit__',
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
PI> PI> '__hash__', '__init__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
PI> '__repr__',
PI> PI> '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', '__weakref__', '_accumulatorServer',
PI> '_active_spark_context',
PI> PI> '_batchSize', '_callsite', '_checkpointFile',
'_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized',
PI> '_gateway',
PI> PI> '_getJavaStorageLevel', '_initialize_context',
'_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
PI> PI> '_pickled_broadcast_vars', '_python_includes',
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
PI> 'addFile',
PI> PI> 'addPyFile', 'appName', 'applicationId',
'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
PI> 'cancelJobGroup',
PI> PI> 'defaultMinPartitions', 'defaultParallelism',
'dump_profiles', 'emptyRDD', 'environment', 'getConf',
PI> 'getLocalProperty',
PI> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD',
'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
PI> 'pickleFile',
PI> PI> 'profiler_collector', 'pythonExec', 'pythonVer',
'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
PI> PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel',
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
PI> 'startTime',
PI> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
'union', 'version', 'wholeTextFiles'] local 1
PI> PI> ```
PI> PI>
PI> PI> This even though the "master" property in the
interpretter is set to "local[*]". I'd like to use all cores on my
PI> machine. To
PI> PI> do that I have to explicitly create the
"spark.master" property in the spark interpretter with the value "local[*]",
PI> then I
PI> PI> get
PI> PI>
PI> PI> ```new output
PI> PI> <SparkContext master=local[*] appName=Zeppelin>
['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__',
PI> '__dir__',
PI> PI> '__doc__', '__enter__', '__eq__', '__exit__',
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
PI> PI> '__hash__', '__init__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
PI> '__repr__',
PI> PI> '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', '__weakref__', '_accumulatorServer',
PI> '_active_spark_context',
PI> PI> '_batchSize', '_callsite', '_checkpointFile',
'_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized',
PI> '_gateway',
PI> PI> '_getJavaStorageLevel', '_initialize_context',
'_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
PI> PI> '_pickled_broadcast_vars', '_python_includes',
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
PI> 'addFile',
PI> PI> 'addPyFile', 'appName', 'applicationId',
'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
PI> 'cancelJobGroup',
PI> PI> 'defaultMinPartitions', 'defaultParallelism',
'dump_profiles', 'emptyRDD', 'environment', 'getConf',
PI> 'getLocalProperty',
PI> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD',
'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
PI> 'pickleFile',
PI> PI> 'profiler_collector', 'pythonExec', 'pythonVer',
'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
PI> PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel',
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
PI> 'startTime',
PI> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
'union', 'version', 'wholeTextFiles'] local[*] 8
PI> PI> ```
PI> PI> This is what I want.
PI> PI>
PI> PI> The Questions
PI> PI> @ Why is the "master" property not used in the
created SparkContext?
PI> PI> @ How do I add the spark.master property to the
docker image?
PI> PI>
PI> PI> Any hint or support you can provide would be
greatly appreciated.
PI> PI>
PI> PI> Yours Sincerely,
PI> PI> Patrik Iselind
PI>
PI> PI> --
PI> PI> Best Regards
PI> PI>
PI> PI> Jeff Zhang
PI>
PI> PI> --
PI> PI> Best Regards
PI> PI>
PI> PI> Jeff Zhang
PI> --
PI> With best wishes, Alex Ott
PI> http://alexott.net/
PI> Twitter: alexott_en (English), alexott (Russian)
--
With best wishes, Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)