Arpit, `segment.flush.threshold.size` and `segment.flush.threshold.rows` are the same parameters. The former is deprecated. If you specify the threshold.rows, the value for threshold.size will be ignored. That means your effective flush threshold parameters are: segment.flush.threshold.rows:"0" segment.flush.threshold.time:"6h" On the other hand, having threshold.rows as zero kicks off auto-tuning process, meaning that initially 100K rows are consumed. Then the memory size of the consumed segment with 100K is compared with the value of `segment.flush.threshold.segment.size` parameter which indicates the desired segment size. If the consumed segment size is smaller than the desired size, 100K will be increased to generate a bigger segment size. Hope that answers your question.
On Wed, Oct 20, 2021 at 6:07 AM Arpit Jain <[email protected]> wrote: > I have checked logs and cant find any obvious errors. Both segments are in > "consuming" state > I do see below line in logs and I am not sure how it picks this number and > has it stopped consuming any more because of this limit? > Stopping consumption due to row limit nRows=100000, > numRowsIndxed=100000,numRowsconsumed=100000 > > Thanks > > On Wed, Oct 20, 2021, 1:39 PM Mayank Shrivastava <[email protected]> > wrote: > >> Hi Arpit, >> >> 1. You can check the external view of the real-time table (via swagger >> api or zk browser in the console). Segments showing as ONLINE are flushed >> to disk and ones showing as CONSUMING are still in memory and not committed >> to disk yet. >> 2. Can you run the debug api from swagger to see if any errors in server? >> >> Also, for faster turnaround, please join the Apache Pinot slack community >> as well. >> >> Thanks >> Mayank >> >> > On Oct 20, 2021, at 3:10 AM, Arpit Jain <[email protected]> wrote: >> > >> > >> > Hi, >> > >> > I have setup a Pinot cluster 0.8.0 for real time data ingestion from >> Kafka. It is able to consume data but it just consumes 100000 docs and >> stops I believe. >> > Reading the docs, it should flush after a certain period of time/rows >> but I think thats not happening. >> > I have below questions: >> > 1. How do I confirm if its flushing to disk? >> > 2. Why it is only consuming 100k docs ? >> > My settings are: >> > segment.flush.threshold.rows:"0" >> > segment.flush.threshold.size:"10000000" >> > segment.flush.threshold.time:"6h" >> > segment.flush.segment.size:"150M" >> > >> > Any inputs welcome. >> > >> > Regards, >> > Arpit >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >>
