Dear Andy, As far as I understand, the transformations are applied to the RDDs not to the data and I need to send the actual data to Kafka. This way, I think I should perform at least one action to make spark load the data.
Kindly correct me if I do not understand this the correct way. Best regards. On 29 March 2016 at 19:40, Andy Davidson <a...@santacruzintegration.com> wrote: > Hi Fanoos > > I would be careful about using collect(). You need to make sure you local > computer has enough memory to hold your entire data set. > > Eventually I will need to do something similar. I have to written the code > yet. My plan is to load the data into a data frame and then write a UDF > that actually publishes the Kafka > > If you are using RDD’s you could use map() or some other transform to > cause the data to be published > > Andy > > From: fanooos <dev.fano...@gmail.com> > Date: Tuesday, March 29, 2016 at 4:26 AM > To: "user @spark" <user@spark.apache.org> > Subject: Re: Sending events to Kafka from spark job > > I think I find a solution but I have no idea how this affects the execution > of the application. > > At the end of the script I added a sleep statement. > > import time > time.sleep(1) > > > This solved the problem. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Sending-events-to-Kafka-from-spark-job-tp26622p26624.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > -- Anas Rabei Senior Software Developer Mubasher.info anas.ra...@mubasher.info