Re: Question about index segment search order

2023-05-13 Thread Uwe Schindler
Hi, in reference to previous code references and discussions from other Lucene committers I have to clarify: * If you run the query multithreaded (per segment), this means when you add an Executor to IndexSearcher, the order is not predicatable, plain simple * If you use Solr, a

Re: Question about index segment search order

2023-05-11 Thread Wei
Hi Michael, Yes the collector counts hits across all segments. Thanks for the suggestion, I'm also asking the question on solr-dev. Wei On Thu, May 11, 2023 at 11:57 AM Michael Sokolov wrote: > Maybe ask this issue on solr-dev then? I'm not familiar with how that > collector works. Does it

Re: Question about index segment search order

2023-05-11 Thread Michael Sokolov
Maybe ask this issue on solr-dev then? I'm not familiar with how that collector works. Does it count hits across all segments? only within a single segment? On Tue, May 9, 2023 at 1:36 PM Wei wrote: > > Hi Michael, > > I am applying early termination with Solr's EarlyTerminatingCollector >

Re: Question about index segment search order

2023-05-09 Thread Wei
Hi Michael, I am applying early termination with Solr's EarlyTerminatingCollector https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java , which triggers EarlyTerminatingCollectorException in

Re: Question about index segment search order

2023-05-04 Thread Michael Sokolov
Yes, sorry I didn't mean to imply you couldn't control this if you want to. I guess in the typical setup it is not predictable. How are you applying early termination? Are you using a standard Lucene Collector or do you have your own? On Thu, May 4, 2023 at 2:03 PM Patrick Zhai wrote: > > Hi

Re: Question about index segment search order

2023-05-04 Thread Patrick Zhai
Hi Mike, Just want to mention if the user chooses to use single thread to index and use LogXXMergePolicy then the document order will be preserved as index order. On Thu, May 4, 2023 at 10:04 AM Wei wrote: > Hi Michael, > > We are interested in the segment sequence for early termination. In

Re: Question about index segment search order

2023-05-04 Thread Wei
Hi Michael, We are interested in the segment sequence for early termination. In our case there is always a large dominant segment after index rebuild, then many small segments are generated with continuous updates as time goes by. When early termination is applied, the limit could be reached

Re: Question about index segment search order

2023-05-04 Thread Michael Sokolov
There is no meaning to the sequence. The segments are created concurrently by many threads and the merge process will merge them without regards to any ordering. On Wed, May 3, 2023, 1:09 PM Patrick Zhai wrote: > For that part I'm not entirely sure, if other folks know it please chime in > :)

Re: Question about index segment search order

2023-05-03 Thread Patrick Zhai
For that part I'm not entirely sure, if other folks know it please chime in :) On Wed, May 3, 2023 at 8:48 AM Wei wrote: > Thanks Patrick! In the default case when no LeafSorter is provided, are the > segments traversed in the order of creation time, i.e. the oldest segment > is always visited

Re: Question about index segment search order

2023-05-03 Thread Wei
Thanks Patrick! In the default case when no LeafSorter is provided, are the segments traversed in the order of creation time, i.e. the oldest segment is always visited first? Wei On Tue, May 2, 2023 at 7:22 PM Patrick Zhai wrote: > Hi Wei, > Lucene in general iterate through the index in the

Re: Question about index segment search order

2023-05-02 Thread Patrick Zhai
Hi Wei, Lucene in general iterate through the index in the order of what is recorded in the SegmentInfos And at search time, you can specify the order using LeafSorter

Question about index segment search order

2023-05-02 Thread Wei
Hello, We have a index that has multiple segments generated with continuous updates. Does Lucene have a specific order when iterate through the segments (assuming single query thread) ? Can the order be customized that the latest generated segments are searched first? Thanks, Wei