Hi,

in reference to previous code references and discussions from other Lucene committers I have to clarify:

 * If you run the query multithreaded (per segment), this means when
   you add an Executor to IndexSearcher, the order is not predicatable,
   plain simple
 * If you use Solr, a single query is not multithreaded. Solr works on
   shards and paralellizes them, but it does not parallelize search on
   a single index
 * If you want to have control on the order of segments when searching,
   theres an easy way with pure lucene, Solr would need to be patched:
     o don't pass Executor (see above)
     o when constructing the IndexSearcher, don't simply pass
       IndexReader but instead "customize it": There are two ways to do
       it: (a) You can take the existing IndexReader and then get all
       leave segments from it (IndexReader#leaves() call). Sort the
       leaves in the order you like it to be searched and then create a
       MultiReader on those sorged segments. (b) alternatively use
       DirectoryReader#open() with a Comparator to sort the segments.
       You could order them reverse on their segment ID.

Anyways, Solr needs to be patched, there are no API hooks to dig into that. You may be able to subclass SolrIndexSearcher, but you still need to hook it into the Solr control flow.

Uwe

Am 08.05.2023 um 16:47 schrieb Wei:
Hi Michael,

I am applying early termination with Solr's EarlyTerminatingCollector
https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
,
which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281

Thanks,
Wei


On Thu, May 4, 2023 at 11:47 AM Michael Sokolov<msoko...@gmail.com>  wrote:

Yes, sorry I didn't mean to imply you couldn't control this if you
want to. I guess in the typical setup it is not predictable. How are
you applying early termination? Are you using a standard Lucene
Collector or do you have your own?

On Thu, May 4, 2023 at 2:03 PM Patrick Zhai<zhai7...@gmail.com>  wrote:
Hi Mike,
Just want to mention if the user chooses to use single thread to index
and
use LogXXMergePolicy then the document order will be preserved as index
order.



On Thu, May 4, 2023 at 10:04 AM Wei<weiwan...@gmail.com>  wrote:

Hi Michael,

We are interested in the segment sequence for early termination. In our
case there is always a large dominant segment after index rebuild,
then
many small segments are generated with continuous updates as time goes
by.
When early termination is applied, the limit could be reached just for
traversing the dominant segment alone and the newer smaller segments
doesn't get a chance.  If we can control the segment sequence so that
the
newer segments are visited first, the documents with recent updates
can be
retrieved with early termination.  Do you think this makes sense? Any
suggestion is appreciated.

Thanks,
Wei

On Thu, May 4, 2023 at 3:33 AM Michael Sokolov<msoko...@gmail.com>
wrote:
There is no meaning to the sequence. The segments are created
concurrently
by many threads and the merge process will merge them without
regards to
any ordering.



On Wed, May 3, 2023, 1:09 PM Patrick Zhai<zhai7...@gmail.com>
wrote:
For that part I'm not entirely sure, if other folks know it please
chime
in
:)

On Wed, May 3, 2023 at 8:48 AM Wei<weiwan...@gmail.com>  wrote:

Thanks Patrick! In the default case when no LeafSorter is
provided,
are
the
segments traversed in the order of creation time, i.e. the oldest
segment
is always visited first?

Wei

On Tue, May 2, 2023 at 7:22 PM Patrick Zhai<zhai7...@gmail.com>
wrote:
Hi Wei,
Lucene in general iterate through the index in the order of
what is
recorded in the SegmentInfos
<

https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
And at search time, you can specify the order using LeafSorter
<

https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
when you're opening the IndexReader

Patrick

On Tue, May 2, 2023 at 5:28 PM Wei<weiwan...@gmail.com>
wrote:
Hello,

We have a index that has multiple segments generated with
continuous
updates. Does Lucene  have a specific order when iterate
through
the
segments (assuming single query thread) ? Can the order be
customized
that
the latest generated segments are searched first?

Thanks,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org


--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

Reply via email to