You might not have enough cores to process data from Kafka
> When running a Spark Streaming program locally, do not use “local” or > “local[1]” as the master URL. Either of these means that only one thread > will be used for running tasks locally. If you are using a input DStream > based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single > thread will be used to run the receiver, leaving no thread for processing > the received data. *Hence, when running locally, always use “local[n]” as > the master URL, *where n > number of receivers to run (see Spark > Properties for information on how to set the master).* https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers <https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers> 2015-12-01 7:13 GMT+01:00 Cassa L <lcas...@gmail.com>: > Hi, > I am reading data from Kafka into spark. It runs fine for sometime but > then hangs forever with following output. I don't see and errors in logs. > How do I debug this? > > 2015-12-01 06:04:30,697 [dag-scheduler-event-loop] INFO > (Logging.scala:59) - Adding task set 19.0 with 4 tasks > 2015-12-01 06:04:30,872 [pool-13-thread-1] INFO (Logging.scala:59) - > Disconnected from Cassandra cluster: APG DEV Cluster > 2015-12-01 06:04:35,060 [JobGenerator] INFO (Logging.scala:59) - Added > jobs for time 1448949875000 ms > 2015-12-01 06:04:40,054 [JobGenerator] INFO (Logging.scala:59) - Added > jobs for time 1448949880000 ms > 2015-12-01 06:04:45,034 [JobGenerator] INFO (Logging.scala:59) - Added > jobs for time 1448949885000 ms > 2015-12-01 06:04:50,100 [JobGenerator] INFO (Logging.scala:59) - Added > jobs for time 1448949890000 ms > 2015-12-01 06:04:55,064 [JobGenerator] INFO (Logging.scala:59) - Added > jobs for time 1448949895000 ms > 2015-12-01 06:05:00,125 [JobGenerator] INFO (Logging.scala:59) - Added > jobs for time 1448949900000 ms > > > Thanks > LCassa > -- Paul Leclercq | Data engineer paul.lecle...@tabmo.io | http://www.tabmo.fr/