Ya , I mean dump in hdfs as a file ,via yarn cluster mode .
On Jul 8, 2016 3:10 PM, "Yu Wei" <yu20...@hotmail.com> wrote:

> How could I dump data into text file? Writing to HDFS or other approach?
>
>
> Thanks,
>
> Jared
> ------------------------------
> *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com>
> *Sent:* Thursday, July 7, 2016 7:04:29 PM
> *To:* Yu Wei
> *Cc:* Mich Talebzadeh; user; Deng Ching-Mallete
> *Subject:* Re: Is that possible to launch spark streaming application on
> yarn with only one machine?
>
>
> In that case, I suspect that Mqtt is not getting data while you are
> submitting  in yarn cluster .
>
> Can you please try dumping data in text file instead of printing while
> submitting in yarn cluster mode.?
> On Jul 7, 2016 12:46 PM, "Yu Wei" <yu20...@hotmail.com> wrote:
>
>> Yes. Thanks for your clarification.
>>
>> The problem I encountered is that in yarn cluster mode, no output for
>> "DStream.print()" in yarn logs.
>>
>>
>> In spark implementation org/apache/spark/streaming/dstream/DStream.scala,
>> the logs related with "Time" was printed out. However, other information
>> for firstNum.take(num).foreach(println) was not printed in logs.
>>
>> What's the root cause for the behavior difference?
>>
>>
>> /**
>>    * Print the first ten elements of each RDD generated in this DStream.
>> This is an output
>>    * operator, so this DStream will be registered as an output stream and
>> there materialized.
>>    */
>>   def print(): Unit = ssc.withScope {
>>     print(10)
>>   }
>>
>>   /**
>>    * Print the first num elements of each RDD generated in this DStream.
>> This is an output
>>    * operator, so this DStream will be registered as an output stream and
>> there materialized.
>>    */
>>   def print(num: Int): Unit = ssc.withScope {
>>     def foreachFunc: (RDD[T], Time) => Unit = {
>>       (rdd: RDD[T], time: Time) => {
>>         val firstNum = rdd.take(num + 1)
>>         // scalastyle:off println
>>         println("-------------------------------------------")
>>         println("Time: " + time)
>>         println("-------------------------------------------")
>>         firstNum.take(num).foreach(println)
>>         if (firstNum.length > num) println("...")
>>         println()
>>         // scalastyle:on println
>>       }
>>     }
>>
>> Thanks,
>>
>> Jared
>>
>>
>> ------------------------------
>> *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com>
>> *Sent:* Thursday, July 7, 2016 1:04 PM
>> *To:* Yu Wei
>> *Cc:* Mich Talebzadeh; Deng Ching-Mallete; user@spark.apache.org
>> *Subject:* Re: Is that possible to launch spark streaming application on
>> yarn with only one machine?
>>
>> In yarn cluster mode , Driver is running in AM , so you can find the logs
>> in that AM log . Open rersourcemanager UI , and check for the Job and logs.
>> or yarn logs -applicationId <appId>
>>
>> In yarn client mode , the driver is the same JVM from where you are
>> launching ,,So you are getting it in the log .
>>
>> On Thu, Jul 7, 2016 at 7:56 AM, Yu Wei <yu20...@hotmail.com> wrote:
>>
>>> Launching via client deploy mode, it works again.
>>>
>>> I'm still a little confused about the behavior difference for cluster
>>> and client mode on a single machine.
>>>
>>>
>>> Thanks,
>>>
>>> Jared
>>> ------------------------------
>>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
>>> *Sent:* Wednesday, July 6, 2016 9:46:11 PM
>>> *To:* Yu Wei
>>> *Cc:* Deng Ching-Mallete; user@spark.apache.org
>>>
>>> *Subject:* Re: Is that possible to launch spark streaming application
>>> on yarn with only one machine?
>>>
>>> Deploy-mode cluster don't think will work.
>>>
>>> Try --master yarn --deploy-mode client
>>>
>>> FYI
>>>
>>>
>>>    -
>>>
>>>    *Spark Local* - Spark runs on the local host. This is the simplest
>>>    set up and best suited for learners who want to understand different
>>>    concepts of Spark and those performing unit testing.
>>>    -
>>>
>>>    *Spark Standalone *– a simple cluster manager included with Spark
>>>    that makes it easy to set up a cluster.
>>>    -
>>>
>>>    *YARN Cluster Mode,* the Spark driver runs inside an application
>>>    master process which is managed by YARN on the cluster, and the client 
>>> can
>>>    go away after initiating the application. This is invoked with –master
>>>    yarn and --deploy-mode cluster
>>>    -
>>>
>>>    *YARN Client Mode*, the driver runs in the client process, and the
>>>    application master is only used for requesting resources from YARN. 
>>> Unlike Spark
>>>    standalone mode, in which the master’s address is specified in the
>>>    --master parameter, in YARN mode the ResourceManager’s address is
>>>    picked up from the Hadoop configuration. Thus, the --master
>>>    parameter is yarn. This is invoked with --deploy-mode client
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 6 July 2016 at 12:31, Yu Wei <yu20...@hotmail.com> wrote:
>>>
>>>> Hi Deng,
>>>>
>>>> I tried the same code again.
>>>>
>>>> It seemed that when launching application via yarn on single node,
>>>> JavaDStream.print() did not work. However, occasionally it worked.
>>>>
>>>> If launch the same application in local mode, it always worked.
>>>>
>>>>
>>>> The code is as below,
>>>>
>>>>         SparkConf conf = new SparkConf().setAppName("Monitor&Control");
>>>>         JavaStreamingContext jssc = new JavaStreamingContext(conf,
>>>> Durations.seconds(1));
>>>>         JavaReceiverInputDStream<String> inputDS =
>>>> MQTTUtils.createStream(jssc, "tcp://114.55.145.185:1883", "Control");
>>>>         inputDS.print();
>>>>         jssc.start();
>>>>         jssc.awaitTermination();
>>>>
>>>>
>>>> Command for launching via yarn, (did not work)
>>>>
>>>> spark-submit --master yarn --deploy-mode cluster --driver-memory 4g
>>>> --executor-memory 2g target/CollAna-1.0-SNAPSHOT.jar
>>>>  Command for launching via local mode (works)
>>>>    spark-submit --master local[4] --driver-memory 4g --executor-memory
>>>> 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar
>>>>
>>>>
>>>>
>>>> Any advice?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Jared
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Yu Wei <yu20...@hotmail.com>
>>>> *Sent:* Tuesday, July 5, 2016 4:41 PM
>>>> *To:* Deng Ching-Mallete
>>>>
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Is that possible to launch spark streaming application
>>>> on yarn with only one machine?
>>>>
>>>>
>>>> Hi Deng,
>>>>
>>>>
>>>> Thanks for the help. Actually I need pay more attention to memory usage.
>>>>
>>>> I found the root cause in my problem. It seemed that it existed in
>>>> spark streaming MQTTUtils module.
>>>>
>>>> When I use "localhost" in brokerURL, it doesn't work.
>>>>
>>>> After change it to "127.0.0.1", it works now.
>>>>
>>>>
>>>> Thanks again,
>>>>
>>>> Jared
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* odeach...@gmail.com <odeach...@gmail.com> on behalf of Deng
>>>> Ching-Mallete <och...@apache.org>
>>>> *Sent:* Tuesday, July 5, 2016 4:03:28 PM
>>>> *To:* Yu Wei
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Is that possible to launch spark streaming application
>>>> on yarn with only one machine?
>>>>
>>>> Hi Jared,
>>>>
>>>> You can launch a Spark application even with just a single node in
>>>> YARN, provided that the node has enough resources to run the job.
>>>>
>>>> It might also be good to note that when YARN calculates the memory
>>>> allocation for the driver and the executors, there is an additional memory
>>>> overhead that is added for each executor then it gets rounded up to the
>>>> nearest GB, IIRC. So the 4G driver-memory + 4x2G executor memory do not
>>>> necessarily translate to a total of 12G memory allocation. It would be more
>>>> than that, so the node would need to have more than 12G of memory for the
>>>> job to execute in YARN. You should be able to see something like "No
>>>> resources available in cluster.." in the application master logs in YARN if
>>>> that is the case.
>>>>
>>>> HTH,
>>>> Deng
>>>>
>>>> On Tue, Jul 5, 2016 at 4:31 PM, Yu Wei <yu20...@hotmail.com> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I set up pseudo hadoop/yarn cluster on my labtop.
>>>>>
>>>>> I wrote a simple spark streaming program as below to receive messages
>>>>> with MQTTUtils.
>>>>> conf = new SparkConf().setAppName("Monitor&Control");
>>>>> jssc = new JavaStreamingContext(conf, Durations.seconds(1));
>>>>> JavaReceiverInputDStream<String> inputDS =
>>>>> MQTTUtils.createStream(jssc, brokerUrl, topic);
>>>>>
>>>>> inputDS.print();
>>>>> jssc.start();
>>>>> jssc.awaitTermination()
>>>>>
>>>>> If I submitted the app with "--master local[2]", it works well.
>>>>>
>>>>> spark-submit --master local[4] --driver-memory 4g --executor-memory 2g
>>>>> --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar
>>>>>
>>>>> If I submitted with "--master yarn",  no output for "inputDS.print()".
>>>>>
>>>>> spark-submit --master yarn --deploy-mode cluster --driver-memory 4g
>>>>> --executor-memory 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar
>>>>>
>>>>> Is it possible to launch spark application on yarn with only one
>>>>> single node?
>>>>>
>>>>>
>>>>> Thanks for your advice.
>>>>>
>>>>>
>>>>> Jared
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to