Kafka Streaming and Filtering > 3000 partitons

Dave Ariens Wed, 21 Oct 2015 11:51:07 -0700

Hey folks,

I have a very large number of Kafka topics (many thousands of partitions) that 
I want to consume, filter based on topic-specific filters, then produce back to 
filtered topics in Kafka.


Using the receiver-less based approach with Spark 1.4.1 (described 
here<https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md>) 
I am able to use either KafkaUtils.createDirectStream or KafkaUtils.createRDD, 
consume from many topics, and filter them with the same filters but I can't 
seem to wrap my head around how to apply topic-specific filters, or to finally 
produce to topic-specific destination topics.

Another point would be that I will need to checkpoint the metadata after each 
successful batch and set starting offsets per partition back to ZK.  I expect I 
can do that on the final RDDs after casting them accordingly, but if anyone has 
any expertise/guidance doing that and is willing to share, I'd be pretty 
grateful.

Kafka Streaming and Filtering > 3000 partitons

Reply via email to