Hi,

First comes some background, then I have some questions.

*Background*
I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
looks like this:

```Dockerfile
FROM apache/zeppelin:0.8.2


# Install Java and some tools
RUN apt-get -y update &&\
    DEBIAN_FRONTEND=noninteractive \
        apt -y install vim python3-pip

RUN python3 -m pip install -U pyspark

ENV PYSPARK_PYTHON python3
ENV PYSPARK_DRIVER_PYTHON python3
```

When I start a section like so

```Zeppelin paragraph
%pyspark

print(sc)
print()
print(dir(sc))
print()
print(sc.master)
print()
print(sc.defaultParallelism)
```

I get the following output

```output
<SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
'__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
'__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
'_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
'_ensure_initialized', '_gateway', '_getJavaStorageLevel',
'_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
'_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
'version', 'wholeTextFiles'] local 1
```

This even though the "master" property in the interpretter is set to
"local[*]". I'd like to use all cores on my machine. To do that I have to
explicitly create the "spark.master" property in the spark
interpretter with the value "local[*]", then I get

```new output
<SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
'__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
'__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
'_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
'_ensure_initialized', '_gateway', '_getJavaStorageLevel',
'_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
'_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
'version', 'wholeTextFiles'] local[*] 8
```
This is what I want.

*The Questions*

   - Why is the "master" property not used in the created SparkContext?
   - How do I add the spark.master property to the docker image?


Any hint or support you can provide would be greatly appreciated.

Yours Sincerely,
Patrik Iselind

Reply via email to