Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-13 Thread Ivan von Nagy
k.apache.org/docs/latest/streaming-kafka-0-10-integration.html# > locationstrategies > > gives instructions on the default size of the cache and how to increase it. > > On Sat, Nov 12, 2016 at 2:15 PM, Ivan von Nagy <i...@vadio.com> wrote: > > Hi Sean, > > >

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-12 Thread Ivan von Nagy
ngs like GC pauses or coordination latency. > It could also be that the number of consumers being simultaneously created > on each executor causes a thundering herd problem during initial phases > (which then causes job retries, which then causes more consumer churn, > etc.). > &g

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-12 Thread Ivan von Nagy
id, which as > far as I can tell you are still not doing. > > On Nov 10, 2016 7:43 PM, "Shixiong(Ryan) Zhu" <shixi...@databricks.com> > wrote: > >> Yeah, the KafkaRDD cannot be reused. It's better to document it. >> >> On Thu, Nov 10, 2016 at 8:26 A

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-10 Thread Ivan von Nagy
t; is written. > > Do you want to keep arguing with me, or follow my advice and proceed > with debugging any remaining issues after you make the changes I > suggested? > > On Mon, Nov 7, 2016 at 1:35 PM, Ivan von Nagy <i...@vadio.com> wrote: > > With our stream version, we

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-07 Thread Ivan von Nagy
ose as possible causes, and then we can > >> see if there are still issues. > >> > >> As far as 0.8 vs 0.10, Spark doesn't require you to assign or > >> subscribe to a topic in order to update offsets, Kafka does. If you > >> don't like the new Kafka c

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-04 Thread Ivan von Nagy
Yes, the parallel KafkaRDD uses the same consumer group, but each RDD uses a single distinct topic. For example, the group would be something like "storage-group", and the topics would be "storage-channel1", and "storage-channel2". In each thread a KafkaConsumer is started, assigned the partitions