Structured streaming's bottom layer also uses a micro-batch mechanism. It seems that the first batch is slower than the latter, I also often encounter this problem. It feels related to the division of batches. Other the other hand, spark's batch size is usually bigger than flume transaction bache size.
KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> 于2020年10月21日周三 下午12:19写道: > Yes. Changing back to latest worked but I still see the slowness compared > to flume. > > Sent from my iPhone > > On Oct 20, 2020, at 10:21 PM, lec ssmi <shicheng31...@gmail.com> wrote: > > > Do you start your application with chasing the early Kafka data ? > > Lalwani, Jayesh <jlalw...@amazon.com.invalid> 于2020年10月21日周三 上午2:19写道: > >> Are you getting any output? Streaming jobs typically run forever, and >> keep processing data as it comes in the input. If a streaming job is >> working well, it will typically generate output at a certain cadence >> >> >> >> *From: *KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> >> *Date: *Tuesday, October 20, 2020 at 1:23 PM >> *To: *"user @spark" <user@spark.apache.org> >> *Subject: *[EXTERNAL] Spark Structured streaming - Kakfa - slowness with >> query 0 >> >> >> >> *CAUTION*: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender and >> know the content is safe. >> >> >> >> Hi, >> >> >> >> I have started using spark structured streaming for reading data from >> kaka and the job is very slow. Number of output rows keeps increasing in >> query 0 and the job is running forever. any suggestions for this please? >> >> >> >> <image001.png> >> >> >> >> Thanks, >> >> Asmath >> >