Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
Ah right i see. Thank you very much. On May 25, 2016 11:11 AM, "Cody Koeninger" wrote: > There's an overloaded createDirectStream method that takes a map from > topicpartition to offset for the starting point of the stream. > > On Wed, May 25, 2016 at 9:59 AM, trung kien

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread Cody Koeninger
There's an overloaded createDirectStream method that takes a map from topicpartition to offset for the starting point of the stream. On Wed, May 25, 2016 at 9:59 AM, trung kien wrote: > Thank Cody. > > I can build the mapping from time ->offset. However how can i pass this >

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
Thank Cody. I can build the mapping from time ->offset. However how can i pass this offset to Spark Streaming job using that offset? ( using Direct Approach) On May 25, 2016 9:42 AM, "Cody Koeninger" wrote: > Kafka does not yet have meaningful time indexing, there's a kafka

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread Cody Koeninger
Kafka does not yet have meaningful time indexing, there's a kafka improvement proposal for it but it has gotten pushed back to at least 0.10.1 If you want to do this kind of thing, you will need to maintain your own index from time to offset. On Wed, May 25, 2016 at 8:15 AM, trung kien

Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
Hi all, Is there any way to re-compute using Spark Streaming - Kafka Direct Approach from specific time? In some cases, I want to re-compute again from specific time (e.g beginning of day)? is that possible? -- Thanks Kien