If this is related to https://issues.apache.org/jira/browse/SPARK-14105 , are you windowing before doing any transformations at all? Try using map to extract the data you care about before windowing.
On Tue, Mar 22, 2016 at 12:24 PM, Cody Koeninger <c...@koeninger.org> wrote: > I definitely have direct stream jobs that use window() without > problems... Can you post a minimal code example that reproduces the > problem? > > Using print() will confuse the issue, since print() will try to only > use the first partition. > > Use foreachRDD { rdd => rdd.foreach(println) > > or something comparable > > On Tue, Mar 22, 2016 at 10:14 AM, Martin Soch <martin.s...@oracle.com> wrote: >> Hi all, >> >> I am using direct-Kafka-input-stream in my Spark app. When I use window(...) >> function in the chain it will cause the processing pipeline to stop - when I >> open the Spark-UI I can see that the streaming batches are being queued and >> the pipeline reports to process one of the first batches. >> >> To be more correct: the issue happens only when the windows overlap (if >> sliding_interval < window_length). Otherwise the system behaves as expected. >> >> Derivations of window(..) function - like reduceByKeyAndWindow(..), etc. >> works also as expected - pipeline doesn't stop. The same applies when using >> different type of stream. >> >> Is it some known limitation of window(..) function when used with >> direct-Kafka-input-stream ? >> >> Java pseudo code: >> org.apache.spark.streaming.kafka.DirectKafkaInputDStream s; >> s.window(Durations.seconds(10)).print(); // the pipeline will stop >> >> Thanks >> Martin >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org