Some discussion is there in https://github.com/dibbhatt/kafka-spark-consumer and some is mentioned in https://issues.apache.org/jira/browse/SPARK-11045
Let me know if those answer your question . In short, Direct Stream is good choice if you need exact once semantics and message ordering , but many use case does not need such requirement of exact-once and message ordering . If you use Direct Stream the RDD processing parallelism is limited to Kafka partition and you need to store offset details to external store as checkpoint location is not reliable if you modify driver code . Whereas in Receiver based mode , you need to enable WAL for no data loss . But Spark Receiver based consumer from KafkaUtils which uses Kafka High Level API has serious issues , and thus if at all you need to switch to receiver based mode , this low level consumer is a better choice. Performance wise I have not published any number yet , but from internal testing and benchmarking I did ( and validated by folks who uses this consumer ), it perform much better than any existing consumer in Spark . Regards, Dibyendu On Thu, Jan 7, 2016 at 4:28 PM, Jacek Laskowski <ja...@japila.pl> wrote: > On Thu, Jan 7, 2016 at 11:39 AM, Dibyendu Bhattacharya > <dibyendu.bhattach...@gmail.com> wrote: > > You are using low level spark kafka consumer . I am the author of the > same. > > If I may ask, what are the differences between this and the direct > version shipped with spark? I've just started toying with it, and > would appreciate some guidance. Thanks. > > Jacek >