Arpit,

`segment.flush.threshold.size` and `segment.flush.threshold.rows` are the
same parameters. The former is deprecated. If you specify the
threshold.rows, the value for threshold.size will be ignored. That means
your effective flush threshold parameters are:
segment.flush.threshold.rows:"0"
segment.flush.threshold.time:"6h"
On the other hand, having threshold.rows as zero kicks off auto-tuning
process, meaning that initially 100K rows are consumed. Then the memory
size of the consumed segment with 100K is compared with the value of
`segment.flush.threshold.segment.size` parameter which indicates the
desired segment size. If the consumed segment size is smaller than the
desired size, 100K will be increased to generate a bigger segment size.
Hope that answers your question.


On Wed, Oct 20, 2021 at 6:07 AM Arpit Jain <[email protected]> wrote:

> I have checked logs and cant find any obvious errors. Both segments are in
> "consuming" state
> I do see below line in logs and I am not sure how it picks this number and
> has it stopped consuming any more because of this limit?
> Stopping consumption due to row limit nRows=100000,
> numRowsIndxed=100000,numRowsconsumed=100000
>
> Thanks
>
> On Wed, Oct 20, 2021, 1:39 PM Mayank Shrivastava <[email protected]>
> wrote:
>
>> Hi Arpit,
>>
>> 1. You can check the external view of the real-time table (via swagger
>> api or zk browser in the console). Segments showing as ONLINE are flushed
>> to disk and ones showing as CONSUMING are still in memory and not committed
>> to disk yet.
>> 2. Can you run the debug api from swagger to see if any errors in server?
>>
>> Also, for faster turnaround, please join the Apache Pinot slack community
>> as well.
>>
>> Thanks
>> Mayank
>>
>> > On Oct 20, 2021, at 3:10 AM, Arpit Jain <[email protected]> wrote:
>> >
>> > 
>> > Hi,
>> >
>> > I have setup a Pinot cluster 0.8.0 for real time data ingestion from
>> Kafka. It is able to consume data but it just consumes 100000 docs and
>> stops I believe.
>> > Reading the docs, it should flush after a certain period of time/rows
>> but I think thats not happening.
>> > I have below questions:
>> > 1. How do I confirm if its flushing to disk?
>> > 2. Why it is only consuming 100k docs ?
>> > My settings are:
>> > segment.flush.threshold.rows:"0"
>> > segment.flush.threshold.size:"10000000"
>> > segment.flush.threshold.time:"6h"
>> > segment.flush.segment.size:"150M"
>> >
>> > Any inputs welcome.
>> >
>> > Regards,
>> > Arpit
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Reply via email to