Thanks Michael, TD for quick reply. It was helpful. I will let you know the numbers(limit) based on my experiments.
On Wed, Jan 31, 2018 at 3:10 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Just to clarify a subtle difference between DStreams and Structured > Streaming. Multiple input streams in a DStreamGraph is likely to mean they > are all being processed/computed in the same way as there can be only one > streaming query / context active in the StreamingContext. However, in the > case of Structured Streaming, there can be any number of independent > streaming queries (i.e. different computations), and each streaming query > with any number if separate input sources. So Michael's comment of "each > stream will have a thread on the driver" is correct when there are many > independent queries with different computations simultaneously running. > However if all your streams need to be processed in the same way, then its > one streaming query with many inputs, and will require one thread. > > Hope this helps. > > TD > > On Wed, Jan 31, 2018 at 12:39 PM, Michael Armbrust <mich...@databricks.com > > wrote: > >> -dev +user >> >> >>> Similarly for structured streaming, Would there be any limit on number >>> of of streaming sources I can have ? >>> >> >> There is no fundamental limit, but each stream will have a thread on the >> driver that is doing coordination of execution. We comfortably run 20+ >> streams on a single cluster in production, but I have not pushed the >> limits. You'd want to test with your specific application. >> > >