Re: Low Level Kafka Consumer for Spark

2014-08-04 Thread Yan Fang
Another suggestion that may help is that, you can consider use Kafka to store the latest offset instead of Zookeeper. There are at least two benefits: 1) lower the workload of ZK 2) support replay from certain offset. This is how Samza deals with the Kafka offse

Re: Does RDD checkpointing store the entire state in HDFS?

2014-07-17 Thread Yan Fang
etup with the streaming context is also stored into HDFS (the > whole DAG of DStream objects is serialized and saved). > > TD > > > On Wed, Jul 16, 2014 at 5:38 PM, Yan Fang wrote: > > > Hi guys, > > > > am wondering how the RDD checkpointing > > < >

Does RDD checkpointing store the entire state in HDFS?

2014-07-16 Thread Yan Fang
Hi guys, am wondering how the RDD checkpointing works in Spark Streaming. When I use updateStateByKey, does the Spark store the entire state (at one time point) into the HDFS or only put the transformation in