her issue.
> > ssc batch creates new rdds every batch duration, always, even it previous
> > computation did not finish.
> >
> > But with kafka, we can consume more rdds later, after we finish previous
> > rdds.
> > That way it would be much much simpler to not get OOM’ed when starting from
> > beginning,
> > because we can consume many data from kafka during batch duration and then
> > get oom.
> >
> > But we just can not start slow, can not limit how many to consume during
> > batch.
> >
> >
> > >
> > > --
> > > View this message in context:
> > > http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html
> > > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> >
>
>
n it
> previous computation did not finish.
> >
> > But with kafka, we can consume more rdds later, after we finish previous
> rdds.
> > That way it would be much much simpler to not get OOM'ed when starting
> from beginning,
> > because we can consume many data
ch duration and then
> get oom.
>
> But we just can not start slow, can not limit how many to consume during
> batch.
>
>
> >
> > --
> > View this message in context:
> > http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
alances.
But for spark cluster i think this time is not enough.
If there was a way to wait every spark executor to start, rebalance, and only
when start to consume, this issue would be less visible.
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.
broke :|
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3391.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
ation, always, even it previous
>> computation did not finish.
>>
>> But with kafka, we can consume more rdds later, after we finish previous
>> rdds.
>> That way it would be much much simpler to not get OOM’ed when starting from
>> beginning,
>> becau
dds later, after we finish previous rdds.
> That way it would be much much simpler to not get OOM’ed when starting from
> beginning,
> because we can consume many data from kafka during batch duration and then
> get oom.
>
> But we just can not start slow, can not limit how m
can not limit how many to consume during
> batch.
>
>
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
ause we can consume many data from kafka during batch duration and then get
oom.
But we just can not start slow, can not limit how many to consume during batch.
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mappin
ing-of-partitions-to-tasks-tp3360p3379.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
:
http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3374.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3374.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
1 task
> doing work. I would have expected that each task took a subset of the
> partitions.
>
> Is there a way to make more than one task share the work here? Are my
> expectations off here?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-li
doing work. I would have expected that each task took a subset of the
partitions.
Is there a way to make more than one task share the work here? Are my
expectations off here?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of
14 matches
Mail list logo