k.apache.org/docs/latest/streaming-kafka-0-10-integration.html#
> locationstrategies
>
> gives instructions on the default size of the cache and how to increase it.
>
> On Sat, Nov 12, 2016 at 2:15 PM, Ivan von Nagy <i...@vadio.com> wrote:
> > Hi Sean,
> >
>
ngs like GC pauses or coordination latency.
> It could also be that the number of consumers being simultaneously created
> on each executor causes a thundering herd problem during initial phases
> (which then causes job retries, which then causes more consumer churn,
> etc.).
>
&g
id, which as
> far as I can tell you are still not doing.
>
> On Nov 10, 2016 7:43 PM, "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
> wrote:
>
>> Yeah, the KafkaRDD cannot be reused. It's better to document it.
>>
>> On Thu, Nov 10, 2016 at 8:26 A
t; is written.
>
> Do you want to keep arguing with me, or follow my advice and proceed
> with debugging any remaining issues after you make the changes I
> suggested?
>
> On Mon, Nov 7, 2016 at 1:35 PM, Ivan von Nagy <i...@vadio.com> wrote:
> > With our stream version, we
ose as possible causes, and then we can
> >> see if there are still issues.
> >>
> >> As far as 0.8 vs 0.10, Spark doesn't require you to assign or
> >> subscribe to a topic in order to update offsets, Kafka does. If you
> >> don't like the new Kafka c
Yes, the parallel KafkaRDD uses the same consumer group, but each RDD uses
a single distinct topic. For example, the group would be something like
"storage-group", and the topics would be "storage-channel1", and
"storage-channel2". In each thread a KafkaConsumer is started, assigned the
partitions