I'll let TD chime on on this one, but I'm guessing this would be a welcome addition. It's great to see community effort on adding new streams/receivers, adding a Java API for receivers was something we did specifically to allow this :)
- Patrick On Sat, Aug 2, 2014 at 10:09 AM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > Hi, > > I have implemented a Low Level Kafka Consumer for Spark Streaming using > Kafka Simple Consumer API. This API will give better control over the Kafka > offset management and recovery from failures. As the present Spark > KafkaUtils uses HighLevel Kafka Consumer API, I wanted to have a better > control over the offset management which is not possible in Kafka HighLevel > consumer. > > This Project is available in below Repo : > > https://github.com/dibbhatt/kafka-spark-consumer > > > I have implemented a Custom Receiver consumer.kafka.client.KafkaReceiver. > The KafkaReceiver uses low level Kafka Consumer API (implemented in > consumer.kafka packages) to fetch messages from Kafka and 'store' it in > Spark. > > The logic will detect number of partitions for a topic and spawn that many > threads (Individual instances of Consumers). Kafka Consumer uses Zookeeper > for storing the latest offset for individual partitions, which will help to > recover in case of failure. The Kafka Consumer logic is tolerant to ZK > Failures, Kafka Leader of Partition changes, Kafka broker failures, > recovery from offset errors and other fail-over aspects. > > The consumer.kafka.client.Consumer is the sample Consumer which uses this > Kafka Receivers to generate DStreams from Kafka and apply a Output > operation for every messages of the RDD. > > We are planning to use this Kafka Spark Consumer to perform Near Real Time > Indexing of Kafka Messages to target Search Cluster and also Near Real Time > Aggregation using target NoSQL storage. > > Kindly let me know your view. Also if this looks good, can I contribute to > Spark Streaming project. > > Regards, > Dibyendu >