"dependent" I mean this batch's job relies on the previous batch's result. So this batch should wait for the finish of previous batch, if you set " spark.streaming.concurrentJobs" larger than 1, then the current batch could start without waiting for the previous batch (if it is delayed), which will lead to unexpected results.
thomas lavocat <thomas.lavo...@univ-grenoble-alpes.fr> 于2018年6月5日周二 下午7:48写道: > > On 05/06/2018 13:44, Saisai Shao wrote: > > You need to read the code, this is an undocumented configuration. > > I'm on it right now, but, Spark is a big piece of software. > > Basically this will break the ordering of Streaming jobs, AFAIK it may get > unexpected results if you streaming jobs are not independent. > > What do you mean exactly by not independent ? > Are several source joined together dependent ? > > Thanks, > Thomas > > > thomas lavocat <thomas.lavo...@univ-grenoble-alpes.fr> 于2018年6月5日周二 > 下午7:17写道: > >> Hello, >> >> Thank's for your answer. >> >> On 05/06/2018 11:24, Saisai Shao wrote: >> >> spark.streaming.concurrentJobs is a driver side internal configuration, >> this means that how many streaming jobs can be submitted concurrently in >> one batch. Usually this should not be configured by user, unless you're >> familiar with Spark Streaming internals, and know the implication of this >> configuration. >> >> >> How can I find some documentation about those implications ? >> >> I've experimented some configuration of this parameters and found out >> that my overall throughput is increased in correlation with this property. >> But I'm experiencing scalability issues. With more than 16 receivers >> spread over 8 executors, my executors no longer receive work from the >> driver and fall idle. >> Is there an explanation ? >> >> Thanks, >> Thomas >> >> >