Re: Apache Spark master value question

Alex Ott Sat, 16 May 2020 02:23:27 -0700

Spark master is set to `local[*]` by default. Here is corresponding piece
form interpreter-settings.json for Spark interpreter:


      "master": {
        "envName": "MASTER",
        "propertyName": "spark.master",
        "defaultValue": "local[*]",
        "description": "Spark master uri. local | yarn-client | yarn-cluster | 
spark master address of standalone mode, ex) spark://master_host:7077",
        "type": "string"
      },


Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
 PI> Hi Jeff,

 PI> I've tried the release from http://zeppelin.apache.org/download.html, both 
in a docker and without a docker. They both have the same issue as
 PI> previously described.

 PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps using 
some environment variable?

 PI> When is the next Zeppelin 0.9.0 docker image planned to be released?

 PI> Best Regards,
 PI> Patrik Iselind

 PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <[email protected]> wrote:

 PI>     Hi Patric,
 PI>    
 PI>     Do you mind to try the 0.9.0-preview, it might be an issue of docker 
container.
 PI>    
 PI>     http://zeppelin.apache.org/download.html

 PI>     Patrik Iselind <[email protected]> 于2020年5月10日周日上午2:30写道：
 PI>    
 PI>         Hello Jeff,
 PI>        
 PI>         Thank you for looking into this for me.
 PI>        
 PI>         Using the latest pushed docker image for 0.9.0 (image ID 
92890adfadfb, built 6 weeks ago), I still see the same issue. My image has
 PI>         the digest 
"apache/zeppelin@sha256:0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
 PI>        
 PI>         If it's not on the tip of master, could you guys please release a 
newer 0.9.0 image?
 PI>        
 PI>         Best Regards,
 PI>         Patrik Iselind

 PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <[email protected]> wrote:
 PI>        
 PI>             This might be a bug of 0.8, I tried that in 0.9 (master 
branch), it works for me.
 PI>            
 PI>             print(sc.master)
 PI>             print(sc.defaultParallelism)
 PI>            
 PI>             ---
 PI>             local[*] 8

 PI>             Patrik Iselind <[email protected]> 于2020年5月9日周六下午8:34写道：
 PI>            
 PI>                 Hi,
 PI>                
 PI>                 First comes some background, then I have some questions.
 PI>                
 PI>                 Background
 PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker image. 
My Docker file looks like this:
 PI>                
 PI>                 ```Dockerfile
 PI>                 FROM apache/zeppelin:0.8.2                                 
                                                                  
 PI>                
 PI>                 # Install Java and some tools
 PI>                 RUN apt-get -y update &&\
 PI>                     DEBIAN_FRONTEND=noninteractive \
 PI>                         apt -y install vim python3-pip
 PI>                
 PI>                 RUN python3 -m pip install -U pyspark
 PI>                
 PI>                 ENV PYSPARK_PYTHON python3
 PI>                 ENV PYSPARK_DRIVER_PYTHON python3
 PI>                 ```
 PI>                
 PI>                 When I start a section like so
 PI>                
 PI>                 ```Zeppelin paragraph
 PI>                 %pyspark
 PI>                
 PI>                 print(sc)
 PI>                 print()
 PI>                 print(dir(sc))
 PI>                 print()
 PI>                 print(sc.master)
 PI>                 print()
 PI>                 print(sc.defaultParallelism)
 PI>                 ```
 PI>                
 PI>                 I get the following output
 PI>                
 PI>                 ```output
 PI>                 <SparkContext master=local appName=Zeppelin> 
['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
 PI>                 '__doc__', '__enter__', '__eq__', '__exit__', 
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
 PI>                 '__hash__', '__init__', '__le__', '__lt__', '__module__', 
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 PI>                 '__setattr__', '__sizeof__', '__str__', 
'__subclasshook__', '__weakref__', '_accumulatorServer', 
'_active_spark_context',
 PI>                 '_batchSize', '_callsite', '_checkpointFile', '_conf', 
'_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
 PI>                 '_getJavaStorageLevel', '_initialize_context', 
'_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
 PI>                 '_pickled_broadcast_vars', '_python_includes', 
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', 'addFile',
 PI>                 'addPyFile', 'appName', 'applicationId', 'binaryFiles', 
'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
 PI>                 'defaultMinPartitions', 'defaultParallelism', 
'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
 PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', 
'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
 PI>                 'profiler_collector', 'pythonExec', 'pythonVer', 'range', 
'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
 PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel', 
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
 PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union', 
'version', 'wholeTextFiles'] local 1
 PI>                 ```
 PI>                
 PI>                 This even though the "master" property in the interpretter 
is set to "local[*]". I'd like to use all cores on my machine. To
 PI>                 do that I have to explicitly create the "spark.master" 
property in the spark interpretter with the value "local[*]", then I
 PI>                 get
 PI>                
 PI>                 ```new output
 PI>                 <SparkContext master=local[*] appName=Zeppelin> 
['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
 PI>                 '__doc__', '__enter__', '__eq__', '__exit__', 
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
 PI>                 '__hash__', '__init__', '__le__', '__lt__', '__module__', 
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 PI>                 '__setattr__', '__sizeof__', '__str__', 
'__subclasshook__', '__weakref__', '_accumulatorServer', 
'_active_spark_context',
 PI>                 '_batchSize', '_callsite', '_checkpointFile', '_conf', 
'_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
 PI>                 '_getJavaStorageLevel', '_initialize_context', 
'_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
 PI>                 '_pickled_broadcast_vars', '_python_includes', 
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', 'addFile',
 PI>                 'addPyFile', 'appName', 'applicationId', 'binaryFiles', 
'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
 PI>                 'defaultMinPartitions', 'defaultParallelism', 
'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
 PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', 
'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
 PI>                 'profiler_collector', 'pythonExec', 'pythonVer', 'range', 
'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
 PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel', 
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
 PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union', 
'version', 'wholeTextFiles'] local[*] 8
 PI>                 ```
 PI>                 This is what I want.
 PI>                
 PI>                 The Questions
 PI>                   @ Why is the "master" property not used in the created 
SparkContext?
 PI>                   @ How do I add the spark.master property to the docker 
image?
 PI>                
 PI>                 Any hint or support you can provide would be greatly 
appreciated.
 PI>                
 PI>                 Yours Sincerely,
 PI>                 Patrik Iselind

 PI>             --
 PI>             Best Regards
 PI>            
 PI>             Jeff Zhang

 PI>     --
 PI>     Best Regards
 PI>    
 PI>     Jeff Zhang



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Apache Spark master value question

Reply via email to