Ya , I mean dump in hdfs as a file ,via yarn cluster mode . On Jul 8, 2016 3:10 PM, "Yu Wei" <yu20...@hotmail.com> wrote:
> How could I dump data into text file? Writing to HDFS or other approach? > > > Thanks, > > Jared > ------------------------------ > *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com> > *Sent:* Thursday, July 7, 2016 7:04:29 PM > *To:* Yu Wei > *Cc:* Mich Talebzadeh; user; Deng Ching-Mallete > *Subject:* Re: Is that possible to launch spark streaming application on > yarn with only one machine? > > > In that case, I suspect that Mqtt is not getting data while you are > submitting in yarn cluster . > > Can you please try dumping data in text file instead of printing while > submitting in yarn cluster mode.? > On Jul 7, 2016 12:46 PM, "Yu Wei" <yu20...@hotmail.com> wrote: > >> Yes. Thanks for your clarification. >> >> The problem I encountered is that in yarn cluster mode, no output for >> "DStream.print()" in yarn logs. >> >> >> In spark implementation org/apache/spark/streaming/dstream/DStream.scala, >> the logs related with "Time" was printed out. However, other information >> for firstNum.take(num).foreach(println) was not printed in logs. >> >> What's the root cause for the behavior difference? >> >> >> /** >> * Print the first ten elements of each RDD generated in this DStream. >> This is an output >> * operator, so this DStream will be registered as an output stream and >> there materialized. >> */ >> def print(): Unit = ssc.withScope { >> print(10) >> } >> >> /** >> * Print the first num elements of each RDD generated in this DStream. >> This is an output >> * operator, so this DStream will be registered as an output stream and >> there materialized. >> */ >> def print(num: Int): Unit = ssc.withScope { >> def foreachFunc: (RDD[T], Time) => Unit = { >> (rdd: RDD[T], time: Time) => { >> val firstNum = rdd.take(num + 1) >> // scalastyle:off println >> println("-------------------------------------------") >> println("Time: " + time) >> println("-------------------------------------------") >> firstNum.take(num).foreach(println) >> if (firstNum.length > num) println("...") >> println() >> // scalastyle:on println >> } >> } >> >> Thanks, >> >> Jared >> >> >> ------------------------------ >> *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com> >> *Sent:* Thursday, July 7, 2016 1:04 PM >> *To:* Yu Wei >> *Cc:* Mich Talebzadeh; Deng Ching-Mallete; user@spark.apache.org >> *Subject:* Re: Is that possible to launch spark streaming application on >> yarn with only one machine? >> >> In yarn cluster mode , Driver is running in AM , so you can find the logs >> in that AM log . Open rersourcemanager UI , and check for the Job and logs. >> or yarn logs -applicationId <appId> >> >> In yarn client mode , the driver is the same JVM from where you are >> launching ,,So you are getting it in the log . >> >> On Thu, Jul 7, 2016 at 7:56 AM, Yu Wei <yu20...@hotmail.com> wrote: >> >>> Launching via client deploy mode, it works again. >>> >>> I'm still a little confused about the behavior difference for cluster >>> and client mode on a single machine. >>> >>> >>> Thanks, >>> >>> Jared >>> ------------------------------ >>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com> >>> *Sent:* Wednesday, July 6, 2016 9:46:11 PM >>> *To:* Yu Wei >>> *Cc:* Deng Ching-Mallete; user@spark.apache.org >>> >>> *Subject:* Re: Is that possible to launch spark streaming application >>> on yarn with only one machine? >>> >>> Deploy-mode cluster don't think will work. >>> >>> Try --master yarn --deploy-mode client >>> >>> FYI >>> >>> >>> - >>> >>> *Spark Local* - Spark runs on the local host. This is the simplest >>> set up and best suited for learners who want to understand different >>> concepts of Spark and those performing unit testing. >>> - >>> >>> *Spark Standalone *– a simple cluster manager included with Spark >>> that makes it easy to set up a cluster. >>> - >>> >>> *YARN Cluster Mode,* the Spark driver runs inside an application >>> master process which is managed by YARN on the cluster, and the client >>> can >>> go away after initiating the application. This is invoked with –master >>> yarn and --deploy-mode cluster >>> - >>> >>> *YARN Client Mode*, the driver runs in the client process, and the >>> application master is only used for requesting resources from YARN. >>> Unlike Spark >>> standalone mode, in which the master’s address is specified in the >>> --master parameter, in YARN mode the ResourceManager’s address is >>> picked up from the Hadoop configuration. Thus, the --master >>> parameter is yarn. This is invoked with --deploy-mode client >>> >>> HTH >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> On 6 July 2016 at 12:31, Yu Wei <yu20...@hotmail.com> wrote: >>> >>>> Hi Deng, >>>> >>>> I tried the same code again. >>>> >>>> It seemed that when launching application via yarn on single node, >>>> JavaDStream.print() did not work. However, occasionally it worked. >>>> >>>> If launch the same application in local mode, it always worked. >>>> >>>> >>>> The code is as below, >>>> >>>> SparkConf conf = new SparkConf().setAppName("Monitor&Control"); >>>> JavaStreamingContext jssc = new JavaStreamingContext(conf, >>>> Durations.seconds(1)); >>>> JavaReceiverInputDStream<String> inputDS = >>>> MQTTUtils.createStream(jssc, "tcp://114.55.145.185:1883", "Control"); >>>> inputDS.print(); >>>> jssc.start(); >>>> jssc.awaitTermination(); >>>> >>>> >>>> Command for launching via yarn, (did not work) >>>> >>>> spark-submit --master yarn --deploy-mode cluster --driver-memory 4g >>>> --executor-memory 2g target/CollAna-1.0-SNAPSHOT.jar >>>> Command for launching via local mode (works) >>>> spark-submit --master local[4] --driver-memory 4g --executor-memory >>>> 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar >>>> >>>> >>>> >>>> Any advice? >>>> >>>> >>>> Thanks, >>>> >>>> Jared >>>> >>>> >>>> >>>> ------------------------------ >>>> *From:* Yu Wei <yu20...@hotmail.com> >>>> *Sent:* Tuesday, July 5, 2016 4:41 PM >>>> *To:* Deng Ching-Mallete >>>> >>>> *Cc:* user@spark.apache.org >>>> *Subject:* Re: Is that possible to launch spark streaming application >>>> on yarn with only one machine? >>>> >>>> >>>> Hi Deng, >>>> >>>> >>>> Thanks for the help. Actually I need pay more attention to memory usage. >>>> >>>> I found the root cause in my problem. It seemed that it existed in >>>> spark streaming MQTTUtils module. >>>> >>>> When I use "localhost" in brokerURL, it doesn't work. >>>> >>>> After change it to "127.0.0.1", it works now. >>>> >>>> >>>> Thanks again, >>>> >>>> Jared >>>> >>>> >>>> >>>> ------------------------------ >>>> *From:* odeach...@gmail.com <odeach...@gmail.com> on behalf of Deng >>>> Ching-Mallete <och...@apache.org> >>>> *Sent:* Tuesday, July 5, 2016 4:03:28 PM >>>> *To:* Yu Wei >>>> *Cc:* user@spark.apache.org >>>> *Subject:* Re: Is that possible to launch spark streaming application >>>> on yarn with only one machine? >>>> >>>> Hi Jared, >>>> >>>> You can launch a Spark application even with just a single node in >>>> YARN, provided that the node has enough resources to run the job. >>>> >>>> It might also be good to note that when YARN calculates the memory >>>> allocation for the driver and the executors, there is an additional memory >>>> overhead that is added for each executor then it gets rounded up to the >>>> nearest GB, IIRC. So the 4G driver-memory + 4x2G executor memory do not >>>> necessarily translate to a total of 12G memory allocation. It would be more >>>> than that, so the node would need to have more than 12G of memory for the >>>> job to execute in YARN. You should be able to see something like "No >>>> resources available in cluster.." in the application master logs in YARN if >>>> that is the case. >>>> >>>> HTH, >>>> Deng >>>> >>>> On Tue, Jul 5, 2016 at 4:31 PM, Yu Wei <yu20...@hotmail.com> wrote: >>>> >>>>> Hi guys, >>>>> >>>>> I set up pseudo hadoop/yarn cluster on my labtop. >>>>> >>>>> I wrote a simple spark streaming program as below to receive messages >>>>> with MQTTUtils. >>>>> conf = new SparkConf().setAppName("Monitor&Control"); >>>>> jssc = new JavaStreamingContext(conf, Durations.seconds(1)); >>>>> JavaReceiverInputDStream<String> inputDS = >>>>> MQTTUtils.createStream(jssc, brokerUrl, topic); >>>>> >>>>> inputDS.print(); >>>>> jssc.start(); >>>>> jssc.awaitTermination() >>>>> >>>>> If I submitted the app with "--master local[2]", it works well. >>>>> >>>>> spark-submit --master local[4] --driver-memory 4g --executor-memory 2g >>>>> --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar >>>>> >>>>> If I submitted with "--master yarn", no output for "inputDS.print()". >>>>> >>>>> spark-submit --master yarn --deploy-mode cluster --driver-memory 4g >>>>> --executor-memory 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar >>>>> >>>>> Is it possible to launch spark application on yarn with only one >>>>> single node? >>>>> >>>>> >>>>> Thanks for your advice. >>>>> >>>>> >>>>> Jared >>>>> >>>>> >>>>> >>>> >>> >>