"dependent" I mean this batch's job relies on the previous batch's result.
So this batch should wait for the finish of previous batch, if you set "
spark.streaming.concurrentJobs" larger than 1, then the current batch could
start without waiting for the previous batch (if it is delayed), which will
lead to unexpected results.


thomas lavocat <thomas.lavo...@univ-grenoble-alpes.fr> 于2018年6月5日周二
下午7:48写道:

>
> On 05/06/2018 13:44, Saisai Shao wrote:
>
> You need to read the code, this is an undocumented configuration.
>
> I'm on it right now, but, Spark is a big piece of software.
>
> Basically this will break the ordering of Streaming jobs, AFAIK it may get
> unexpected results if you streaming jobs are not independent.
>
> What do you mean exactly by not independent ?
> Are several source joined together dependent ?
>
> Thanks,
> Thomas
>
>
> thomas lavocat <thomas.lavo...@univ-grenoble-alpes.fr> 于2018年6月5日周二
> 下午7:17写道:
>
>> Hello,
>>
>> Thank's for your answer.
>>
>> On 05/06/2018 11:24, Saisai Shao wrote:
>>
>> spark.streaming.concurrentJobs is a driver side internal configuration,
>> this means that how many streaming jobs can be submitted concurrently in
>> one batch. Usually this should not be configured by user, unless you're
>> familiar with Spark Streaming internals, and know the implication of this
>> configuration.
>>
>>
>> How can I find some documentation about those implications ?
>>
>> I've experimented some configuration of this parameters and found out
>> that my overall throughput is increased in correlation with this property.
>> But I'm experiencing scalability issues. With more than 16 receivers
>> spread over 8 executors, my executors no longer receive work from the
>> driver and fall idle.
>> Is there an explanation ?
>>
>> Thanks,
>> Thomas
>>
>>
>

Reply via email to