Structured streaming's  bottom layer also uses a micro-batch
mechanism. It seems that the first batch is slower than  the latter, I also
often encounter this problem. It feels related to the division of batches.
   Other the other hand, spark's batch size is usually bigger than flume
transaction bache size.


KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> 于2020年10月21日周三 下午12:19写道:

> Yes. Changing back to latest worked but I still see the slowness compared
> to flume.
>
> Sent from my iPhone
>
> On Oct 20, 2020, at 10:21 PM, lec ssmi <shicheng31...@gmail.com> wrote:
>
> 
> Do you start your application  with  chasing the early Kafka data  ?
>
> Lalwani, Jayesh <jlalw...@amazon.com.invalid> 于2020年10月21日周三 上午2:19写道:
>
>> Are you getting any output? Streaming jobs typically run forever, and
>> keep processing data as it comes in the input. If a streaming job is
>> working well, it will typically generate output at a certain cadence
>>
>>
>>
>> *From: *KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
>> *Date: *Tuesday, October 20, 2020 at 1:23 PM
>> *To: *"user @spark" <user@spark.apache.org>
>> *Subject: *[EXTERNAL] Spark Structured streaming - Kakfa - slowness with
>> query 0
>>
>>
>>
>> *CAUTION*: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>>
>>
>>
>> Hi,
>>
>>
>>
>> I have started using spark structured streaming for reading data from
>> kaka and the job is very slow. Number of output rows keeps increasing in
>> query 0 and the job is running forever. any suggestions for this please?
>>
>>
>>
>> <image001.png>
>>
>>
>>
>> Thanks,
>>
>> Asmath
>>
>

Reply via email to