Re: Dependencies with runing Spark Streaming on Mesos cluster using Python

Shuai Lin Wed, 13 Jul 2016 08:48:31 -0700

I think there are two options for you:

First you can set `--conf spark.mesos.executor.docker.image=
adolphlwq/mesos-for-spark-exector-image:1.6.0.beta2` in your spark submit
args, so mesos would launch the executor with your custom image.


Or you can remove the `local:` prefix in the --jars flag, this way the
executors would download the jars from your spark driver.



On Wed, Jul 13, 2016 at 9:08 PM, Luke Adolph <kenan3...@gmail.com> wrote:

> Update:
> I revuild my mesos-exector-image ,I download
> *spark-streaming-kafka_2.10-1.6.0.jar* on *`/linker/jars`*
>
> I change my submit command:
>
> dcos spark run \     --submit-args='--jars
>> local:/linker/jars/spark-streaming-kafka_2.10-1.6.0.jar  spark2cassandra.py
>> 10.140.0.14:2181 wlu_spark2cassandra'     --docker-image
>> adolphlwq/mesos-for-spark-exector-image:1.6.0.beta2
>
>
> Where I get new stderr output on mesos:
>
>
> 
> I only problem is submit the dependency
> spark-streaming-kafka_2.10-1.6.0.jar to worker.
>
> Thanks.
>
>
> 2016-07-13 18:57 GMT+08:00 Luke Adolph <kenan3...@gmail.com>:
>
>> Hi all:
>> My spark runs on mesos.I write a spark streaming app using python, code
>> on GitHub <https://github.com/adolphlwq/linkerProcessorSample>.
>>
>> The app has dependency "
>> *org.apache.spark:spark-streaming-kafka_2.10:1.6.1*".
>>
>> Spark on mesos has two important concepts: Spark Framework and Spark
>> exector.
>>
>> I set my exector run in docker image.The docker image Dockerfile
>> <https://github.com/adolphlwq/linkerProcessorSample/blob/master/docker/Dockerfile>
>>  is
>> below:
>>
>> # refer '
>>> http://spark.apache.org/docs/latest/running-on-mesos.html#spark-properties'
>>> on 'spark.mesos.executor.docker.image' section
>>
>> FROM ubuntu:14.04
>>> WORKDIR /linker
>>> RUN ln -f -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
>>> #download mesos
>>> RUN echo "deb http://repos.mesosphere.io/ubuntu/ trusty main" >
>>> /etc/apt/sources.list.d/mesosphere.list && \
>>>     apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF && \
>>>     apt-get update && \
>>>     apt-get -y install mesos=0.28.1-2.0.20.ubuntu1404 openjdk-7-jre
>>> python-pip git vim curl
>>> RUN git clone https://github.com/adolphlwq/linkerProcessorSample.git &&
>>> \
>>>     pip install -r linkerProcessorSample/docker/requirements.txt
>>> RUN curl -fL
>>> http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
>>> | tar xzf - -C /usr/local && \
>>>     apt-get clean
>>> ENV MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos.so \
>>>     JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 \
>>>     SPARK_HOME=/usr/local/spark-1.6.0-bin-hadoop2.6
>>> ENV PATH=$JAVA_HOME/bin:$PATH
>>> WORKDIR $SPARK_HOME
>>
>>
>> When I use below command to submit my app program:
>>
>> dcos spark run --submit-args='--packages
>>> org.apache.spark:spark-streaming-kafka_2.10:1.6.1 \
>>>        spark2cassandra.py zk topic' \
>>>         -docker-image=adolphlwq/mesos-for-spark-exector-image:1.6.0.beta
>>
>>
>> The exector docker container run successfully, but it has no package for
>> *org.apache.spark:spark-streaming-kafka_2.10:1.6.1*.
>>
>> The *stderr* om mesos is:
>>
>> I0713 09:34:52.715551 18124 logging.cpp:188] INFO level logging started!
>>> I0713 09:34:52.717797 18124 fetcher.cpp:424] Fetcher Info:
>>> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/home\/ubuntu\/spark2cassandra.py"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/com.101tec_zkclient-0.3.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.apache.kafka_kafka_2.10-0.8.2.1.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.slf4j_slf4j-api-1.7.10.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.spark-project.spark_unused-1.0.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/net.jpountz.lz4_lz4-1.3.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/log4j_log4j-1.2.17.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/com.yammer.metrics_metrics-core-2.2.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.apache.kafka_kafka-clients-0.8.2.1.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/root\/.ivy2\/jars\/org.xerial.snappy_snappy-java-1.1.2.jar"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4\/frameworks\/7399b6f7-5dcd-4a9b-9846-e7948d5ffd11-0024\/executors\/driver-20160713093451-0015\/runs\/84419372-9482-4c58-8f87-4ba528b6885c"}
>>> I0713 09:34:52.719846 18124 fetcher.cpp:379] Fetching URI
>>> '/home/ubuntu/spark2cassandra.py'
>>> I0713 09:34:52.719866 18124 fetcher.cpp:250] Fetching directly into the
>>> sandbox directory
>>> I0713 09:34:52.719925 18124 fetcher.cpp:187] Fetching URI
>>> '/home/ubuntu/spark2cassandra.py'
>>> I0713 09:34:52.719945 18124 fetcher.cpp:167] Copying resource with
>>> command:cp '/home/ubuntu/spark2cassandra.py'
>>> '/tmp/mesos/slaves/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4/frameworks/7399b6f7-5dcd-4a9b-9846-e7948d5ffd11-0024/executors/driver-20160713093451-0015/runs/84419372-9482-4c58-8f87-4ba528b6885c/spark2cassandra.py'
>>> W0713 09:34:52.722587 18124 fetcher.cpp:272] Copying instead of
>>> extracting resource from URI with 'extract' flag, because it does not seem
>>> to be an archive: /home/ubuntu/spark2cassandra.py
>>> I0713 09:34:52.724138 18124 fetcher.cpp:456] Fetched
>>> '/home/ubuntu/spark2cassandra.py' to
>>> '/tmp/mesos/slaves/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4/frameworks/7399b6f7-5dcd-4a9b-9846-e7948d5ffd11-0024/executors/driver-20160713093451-0015/runs/84419372-9482-4c58-8f87-4ba528b6885c/spark2cassandra.py'
>>> I0713 09:34:52.724148 18124 fetcher.cpp:379] Fetching URI
>>> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
>>> I0713 09:34:52.724153 18124 fetcher.cpp:250] Fetching directly into the
>>> sandbox directory
>>> I0713 09:34:52.724162 18124 fetcher.cpp:187] Fetching URI
>>> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
>>> I0713 09:34:52.724171 18124 fetcher.cpp:167] Copying resource with
>>> command:cp
>>> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
>>> '/tmp/mesos/slaves/6097419e-c2d0-4e5f-9a91-e5815de640c4-S4/frameworks/7399b6f7-5dcd-4a9b-9846-e7948d5ffd11-0024/executors/driver-20160713093451-0015/runs/84419372-9482-4c58-8f87-4ba528b6885c/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
>>> cp: cannot stat
>>> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar':
>>> No such file or directory
>>> Failed to fetch
>>> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar':
>>> Failed to copy with command 'cp
>>> '/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka_2.10-1.6.0.jar'
>>
>>
>> May somebody help me to solve the dependencies problem!
>>
>> Thanks.
>> --
>> Thanks & Best Regards
>> 卢文泉 | Adolph Lu
>> TEL：+86 15651006559
>> Linker Networks(http://www.linkernetworks.com/)
>>
>
>
>
> --
> Thanks & Best Regards
> 卢文泉 | Adolph Lu
> TEL：+86 15651006559
> Linker Networks(http://www.linkernetworks.com/)
>

Re: Dependencies with runing Spark Streaming on Mesos cluster using Python

Reply via email to