Streaming Receiverless Kafka API + Offset Management

2015-11-16 Thread Nick Evans
I really like the Streaming receiverless API for Kafka streaming jobs, but I'm finding the manual offset management adds a fair bit of complexity. I'm sure that others feel the same way, so I'm proposing that we add the ability to have consumer offsets managed via an easy-to-use API. This would be

Re: Streaming Receiverless Kafka API + Offset Management

2015-11-16 Thread Cody Koeninger
There are already private methods in the code for interacting with Kafka's offset management api. There's a jira for making those methods public, but TD has been reluctant to merge it https://issues.apache.org/jira/browse/SPARK-10963 I think adding any ZK specific behavior to spark is a bad

Re: Streaming Receiverless Kafka API + Offset Management

2015-11-16 Thread Saisai Shao
Kafka now build-in supports managing metadata itself besides ZK, it is easy to use and change from current ZK implementation. I think here the problem is do we need to manage offset in Spark Streaming level or leave this question to user. If you want to manage offset in user level, letting Spark

Re: Streaming Receiverless Kafka API + Offset Management

2015-11-16 Thread Nick Evans
The only dependancy on Zookeeper I see is here: https://github.com/apache/spark/blob/1c5475f1401d2233f4c61f213d1e2c2ee9673067/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala#L244-L247 If that's the only line that depends on Zookeeper, we could probably