Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-04-05 Thread via GitHub


HoustonPutman commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2773533573

   Ok I've renamed to `maxHitsAllowed`, which follows a lot of the other 
shard-specific limits. (like `timeAllowed` or `cpuAllowed`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-04-03 Thread via GitHub


HoustonPutman commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2776005934

   I need to add a changelog entry, but I think this is good to go, and will 
probably merge sometime tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-04-02 Thread via GitHub


HoustonPutman commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r2025471315


##
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##
@@ -295,7 +295,7 @@ private Collector buildAndRunCollectorChain(
 
 final boolean terminateEarly = cmd.getTerminateEarly();
 if (terminateEarly) {
-  collector = new EarlyTerminatingCollector(collector, cmd.getLen());
+  collector = new EarlyTerminatingCollector(collector, 
cmd.getMaxHitsTerminateEarly());

Review Comment:
   So I've changed `SpellCheckCollator::collate` to use this feature now. We 
can remove terminate_early and all of that stuff in a different PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-04-02 Thread via GitHub


HoustonPutman commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r2025456485


##
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##
@@ -295,7 +295,7 @@ private Collector buildAndRunCollectorChain(
 
 final boolean terminateEarly = cmd.getTerminateEarly();
 if (terminateEarly) {
-  collector = new EarlyTerminatingCollector(collector, cmd.getLen());
+  collector = new EarlyTerminatingCollector(collector, 
cmd.getMaxHitsTerminateEarly());

Review Comment:
   Yeah, Gus is right. `SpellCheckCollator::collate` uses it: 
`checkResponse.setFieldFlags(f |= SolrIndexSearcher.TERMINATE_EARLY)`. So we 
need to refactor this to use it correctly. I'm going to refactor this to use 
the new functionality under the hood. Then we can truly remove the 
TerminateEarly stuff



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-03-31 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r2021385080


##
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##
@@ -1976,7 +1975,10 @@ public ScoreMode scoreMode() {
 MultiThreadedSearcher.TopDocsResult topDocsResult = 
searchResult.getTopDocsResult();
 totalHits = topDocsResult.totalHits;
 topDocs = topDocsResult.topDocs;
-
+if (searchResult.isTerminatedEarly) {

Review Comment:
   Yes you are correct. It should mirror the flags here 
https://github.com/sijuv/solr/blob/a4f7301e4778c62b7f6d8d5b212c07b0e57bd8be/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L327
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-03-31 Thread via GitHub


HoustonPutman commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r2021345672


##
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##
@@ -1976,7 +1975,10 @@ public ScoreMode scoreMode() {
 MultiThreadedSearcher.TopDocsResult topDocsResult = 
searchResult.getTopDocsResult();
 totalHits = topDocsResult.totalHits;
 topDocs = topDocsResult.topDocs;
-
+if (searchResult.isTerminatedEarly) {

Review Comment:
   So `searchResult.isTerminatedEarly` could have been set to `true` because 
maxShardHits had been reached right? If so, shouldn't this be setting 
`setMaxHitsTerminatedEarly` to true?
   
   Honestly, shouldn't `searchResult` be using `isMaxHitsTerminatedEarly` not 
`isTerminatedEarly`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-03-15 Thread via GitHub


sijuv commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2704533403

   In the current implementation, the _EarlyTerminatingCollector_ is not used 
in any code path.
   The collector is instantiated at only one place 
https://github.com/apache/solr/blob/ac3d349dac530cf1001d5113fc21b0fd641cc9d5/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L293
 . Now this is done only if _getTerminateEarly()_ returns true. It will never 
return true since the setter is never invoked in any code path. I think 
@gus-asf mentioned that as well, It is an unused method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-03-07 Thread via GitHub


sijuv commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2704572310

   Updated the pr to ensure to not mess with the _terminatedEarly_ flag even 
though it is unused in the current implementation. I think this flag should be 
removed at some point in time. With this pr, there are only 2 ways a search can 
be terminated early ( apart from time/cpu/mem limits ) - because the _maxHits_ 
parameter is set or if it is a sortedIndex and the sort is based on the index 
sort order.
   
   @gus-asf @dsmiley pls help review this and merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-03-06 Thread via GitHub


HoustonPutman commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1983655608


##
solr/core/src/java/org/apache/solr/search/QueryResult.java:
##
@@ -22,6 +22,7 @@ public class QueryResult {
   // Object for back compatibility so that we render true not "true" in json
   private Object partialResults;
   private Boolean segmentTerminatedEarly;

Review Comment:
   I agree with "maxShardHitsReached" does make more sense than 
"terminatedEarly" for me... Especially given that "segmentTerminatedEarly" is 
completely unrelated, as gus mentioned.



##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -29,11 +30,15 @@
  */
 public class EarlyTerminatingCollector extends FilterCollector {
 
+  private final int chunkSize = 100; // Check across threads only at a chunk 
size
+
   private final int maxDocsToCollect;
 
   private int numCollected = 0;
   private int prevReaderCumulativeSize = 0;
   private int currentReaderSize = 0;
+  private final LongAdder pendingDocsToCollect;

Review Comment:
   Yeah as @sijuv mentioned in the other thread, I believe he is only actually 
checking this every 100 docs, not after every add.



##
solr/core/src/java/org/apache/solr/search/QueryCommand.java:
##
@@ -194,7 +195,8 @@ public QueryCommand setNeedDocSet(boolean needDocSet) {
   }
 
   public boolean getTerminateEarly() {
-return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0;
+return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0

Review Comment:
   This is really confusing to me, a different name would really help here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-20 Thread via GitHub


sijuv commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2673329627

   @dsmiley @gus-asf pls relook 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-17 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1958586189


##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -61,11 +69,25 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
   public void collect(int doc) throws IOException {
 super.collect(doc);
 numCollected++;
-if (maxDocsToCollect <= numCollected) {
+terminatedEarly = maxDocsToCollect <= numCollected;
+if (numCollected % chunkSize == 0) {
+  pendingDocsToCollect.add(chunkSize);
+  final long overallCollectedDocCount = 
pendingDocsToCollect.intValue();
+  terminatedEarly = overallCollectedDocCount >= maxDocsToCollect;

Review Comment:
   the boolean is updated only every 100th time to reduce any overhead updating 
the thread shared adder brings in.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-17 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1958613844


##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -29,11 +30,15 @@
  */
 public class EarlyTerminatingCollector extends FilterCollector {
 
+  private final int chunkSize = 100; // Check across threads only at a chunk 
size

Review Comment:
   100 was chosen arbitrarily. I don't think this needs a env prop. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-17 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1958612194


##
solr/solr-ref-guide/modules/query-guide/pages/common-query-parameters.adoc:
##
@@ -399,6 +399,23 @@ For example, setting `cpuAllowed=500` gives a limit of at 
most 500 ms of CPU tim
 
 All other considerations regarding partial results listed for the 
`timeAllowed` parameter apply here, too.
 
+
+== maxHits Parameter
+
+[%autowidth,frame=none]
+|===
+|Optional |Default: `false`
+|===
+
+This parameter specifies the max number of hits a searcher will iterate 
through before early terminating the search.
+The count is per shard and across all threads involved in case of 
multi-threaded search. This parameter works
+in conjunction with other parameters that could early terminate a search, ex: 
_timeAllowed_ etc. In case the search
+was early terminated due to it exceeding maxHits a _terminatedEarly_ header in 
the response will be set along with
+_partialResults_ to indicate the same. Note that the _partialResults_ flag 
could be set in the absence of the _maxHits_
+parameter due to other limits like _timeAllowed_ or _cpuAllowed_.
+Note : the hits counted will need not be exactly equal to the maxHits 
provided, however it will be within range of this value.
+
+

Review Comment:
   updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-17 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1958612329


##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -43,11 +48,14 @@ public class EarlyTerminatingCollector extends 
FilterCollector {
* @param maxDocsToCollect - the maximum number of documents to Collect
*/
   public EarlyTerminatingCollector(Collector delegate, int maxDocsToCollect) {
-super(delegate);
-assert 0 < maxDocsToCollect;

Review Comment:
   added back



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-17 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1958587050


##
solr/core/src/java/org/apache/solr/search/QueryCommand.java:
##
@@ -194,7 +195,8 @@ public QueryCommand setNeedDocSet(boolean needDocSet) {
   }
 
   public boolean getTerminateEarly() {
-return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0;
+return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0

Review Comment:
   _ setTerminateEarly_ is unused _ setSegmentTerminateEarly_ is used in the 
case of sorted segments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-17 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1958582130


##
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##
@@ -329,12 +329,12 @@ private Collector buildAndRunCollectorChain(
   if (collector instanceof DelegatingCollector) {
 ((DelegatingCollector) collector).complete();
   }
-  throw etce;
+  qr.setPartialResults(true);

Review Comment:
   I dont think _EarlyTerminatingCollector_ has been used yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-17 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1958581315


##
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##
@@ -295,7 +295,7 @@ private Collector buildAndRunCollectorChain(
 
 final boolean terminateEarly = cmd.getTerminateEarly();
 if (terminateEarly) {
-  collector = new EarlyTerminatingCollector(collector, cmd.getLen());
+  collector = new EarlyTerminatingCollector(collector, 
cmd.getMaxHitsTerminateEarly());

Review Comment:
   I dont think the _EarlyTerminatingCollector_ has been used in solr yet. The 
code prior to my change was instantiating this collector when _TERMINATE_EARLY_ 
flag was set, but I dont see any references to this flag being set in the code 
because there are no usage references to _setTerminateEarly_.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-02-04 Thread via GitHub


dsmiley commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2635055839

   "caller" by the first definition you listed -- basically the client of Solr. 
 I'll say "client".
   
   Not sure if we have disagreement or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-30 Thread via GitHub


gus-asf commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2625369713

   > (not thinking of backwards compatibility here, just thinking holistically)
   > 
   > I appreciate that there are very different causes of a terminated request, 
and we want to differentiate them in case the client cares and for diagnostics. 
But semantically to the caller, the cause is irrelevant (right?), so I think a 
simple flag/enum should be returned _separate_ for the cause. I'm not arguing 
the cause should be hidden / indiscernible. I understand, once upon a time, 
there was a single cause of "terminate early" but as Solr has matured to do so 
under growing conditions, we shouldn't be held back by segment terminated early 
as the one and only terminate-early. Let's step back and change names / 
semantics if needed. Solr 10 is coming.
   
   If by "caller" you mean the person/programmer/code that is leveraging solr 
to make a fabulous profit or save the world then there are two questions to 
answer: 1) can we trust these results as 100% definitive 2) what can be done to 
obtain definitive results if that's important. For number 1 we don't need to 
know why we just need to know if it's trustworthy, that's entirely binary as 
you suggest. However if we need to adjust things to supply definitive results 
(for that one case the customer really cares about, etc) then we need to know 
which nob to turn, and thus it matters which option is the limiting factor (one 
might set multiple limits or be using limits with this feature). 
   
   If by caller you mean the next (few) method(s) up the stack, then there is 
at least one important distinction between this and limits. In the limits case, 
we typically want to stop all work ASAP once the limit is hit. In this case the 
intent is that all shards should return up to the limiting number of hits. We 
wouldn't want to stop or terminate any other subrequests for this feature. 
Notably this feature would not protect the users from a shard living on a 
critically limping and slow server, whereas timeAllowed should ensure some sort 
of response even if one shard has nearly (but not quite) ground to a halt. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-30 Thread via GitHub


dsmiley commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2625311557

   (not thinking of backwards compatibility here, just thinking holistically)
   
   I appreciate that there are very different causes of a terminated request, 
and we want to differentiate them in case the client cares and for diagnostics. 
 But semantically to the caller, the cause is irrelevant (right?), so I think a 
simple flag/enum should be returned _separate_ for the cause.  I'm not arguing 
the cause should be hidden / indiscernible.  I understand, once upon a time, 
there was a single cause of "terminate early" but as Solr has matured to do so 
under growing conditions, we shouldn't be held back by segment terminated early 
as the one and only terminate-early.  Let's step back and change names / 
semantics if needed.  Solr 10 is coming.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-30 Thread via GitHub


gus-asf commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1935871739


##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -29,11 +30,15 @@
  */
 public class EarlyTerminatingCollector extends FilterCollector {
 
+  private final int chunkSize = 100; // Check across threads only at a chunk 
size

Review Comment:
   How was this chosen? Should it be tuneable via env/sysprop?



##
solr/solr-ref-guide/modules/query-guide/pages/common-query-parameters.adoc:
##
@@ -399,6 +399,23 @@ For example, setting `cpuAllowed=500` gives a limit of at 
most 500 ms of CPU tim
 
 All other considerations regarding partial results listed for the 
`timeAllowed` parameter apply here, too.
 
+
+== maxHits Parameter
+
+[%autowidth,frame=none]
+|===
+|Optional |Default: `false`
+|===
+
+This parameter specifies the max number of hits a searcher will iterate 
through before early terminating the search.
+The count is per shard and across all threads involved in case of 
multi-threaded search. This parameter works
+in conjunction with other parameters that could early terminate a search, ex: 
_timeAllowed_ etc. In case the search
+was early terminated due to it exceeding maxHits a _terminatedEarly_ header in 
the response will be set along with
+_partialResults_ to indicate the same. Note that the _partialResults_ flag 
could be set in the absence of the _maxHits_
+parameter due to other limits like _timeAllowed_ or _cpuAllowed_.
+Note : the hits counted will need not be exactly equal to the maxHits 
provided, however it will be within range of this value.
+
+

Review Comment:
   I have a similar concern as Houston regarding helping the user understand 
when to use this, here's my attempt to rewrite your text based on what I've 
gathered... of course if I misunderstood something or I'm too wordy we should 
fix that ;) As a side note I think we're supposed to write these as "ventilated 
prose" which is one sentence per line, no line breaks in a sentence.
   ```
   == maxHits Parameter
   
   [%autowidth,frame=none]
   |===
   |Optional |Default: `false`
   |===
   
   This parameter specifies the max number of hits a searcher will iterate 
through.
   Searchers will arbitrarily ignore any number of additional hits beyond this 
value.
   Practically speaking, the count is per shard and across all threads involved 
in case of multi-threaded search.
   The intention of this feature is to favor speed over perfect relevancy & 
recall.
   The trade-off is that if one shard contains many relevant hits and another 
contains a few less relevant hits the less relevant hits from the second shard 
may get returned instead of the more relevant hits that were clustered in the 
first shard.
   
   This parameter works in conjunction with other parameters that could early 
terminate a search, ex: _timeAllowed_ etc.
   In case the search was early terminated due to it exceeding maxHits a 
_terminatedEarly_ header in the response will be set along with 
_partialResults_ to indicate the same.
   The _partialResults_ flag could be set in the absence of the _maxHits_ 
parameter due to other limits like _timeAllowed_ or _cpuAllowed_.
   ```
   I have to admit I don't understand this bit:
   
   ```
   NOTE: the hits counted will need not be exactly equal to the maxHits 
provided, however it will be within range of this value.
   ```



##
solr/core/src/java/org/apache/solr/search/QueryCommand.java:
##
@@ -194,7 +195,8 @@ public QueryCommand setNeedDocSet(boolean needDocSet) {
   }
 
   public boolean getTerminateEarly() {
-return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0;
+return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0

Review Comment:
   As I noted in another comment I think this and the Lucene feature are 
significantly different. I don't like conflating them. 



##
solr/core/src/java/org/apache/solr/search/MultiThreadedSearcher.java:
##
@@ -190,10 +194,16 @@ public ScoreMode scoreMode() {
   static class SearchResult {
 final ScoreMode scoreMode;
 private final Object[] result;
+final boolean isTerminatedEarly;
 
 public SearchResult(ScoreMode scoreMode, Object[] result) {
+  this(scoreMode, result, false);

Review Comment:
   My IDE claims this unused. Since this is a package private inner class we 
probably don't need to maintain this constructor for the API.



##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -61,11 +69,25 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
   public void collect(int doc) throws IOException {
 super.collect(doc);
 numCollected++;
-if (maxDocsToCollect <= numCollected) {
+terminatedEarly = maxDocsToCollect <= numCollected;
+if (numCollected % chunkSize == 0) {
+  pendingDocsToCollect.add(chunkSize);
+  

Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-30 Thread via GitHub


gus-asf commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1935817655


##
solr/core/src/java/org/apache/solr/search/QueryResult.java:
##
@@ -22,6 +22,7 @@ public class QueryResult {
   // Object for back compatibility so that we render true not "true" in json
   private Object partialResults;
   private Boolean segmentTerminatedEarly;

Review Comment:
   I agree it's good to know why so that the corrective action is clear, but 
terminateEarly is unlike maxHits (or shardMaxHits) so the relationship takes 
special knowledge to appreciate. If something is going to report a reason I'd 
want it to include the same phrase... i.e. "maxHitsReached" and definately 
would not want to have to "just know" that terminateEarly has nothing to do 
with segmentTerminateEarly despite strong similarity in naming.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-30 Thread via GitHub


gus-asf commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2624803443

   > > Overall this makes sense to me. Thanks for the nice contribution! I'd 
prefer if @gus-asf could take a look at some aspects since he worked on 
cpuAllowed recently. BTW it's clear you need to run `./gradlew tidy`
   > > Addressed.
   > > I suggest renaming the param "maxHitsPerShard" to simply "maxHits" or 
"maxHitsTerminateEarly" and document that it's per-shard and best-effort; that 
more hits may ultimately be detected in aggregate. But maybe others disagree.
   > 
   > Renamed to maxHits
   > 
   
   maybe shardMaxHits? Brevity is nice, but I think it's helpful to have a name 
that is at least slightly self documenting. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


dsmiley commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1933299568


##
solr/core/src/java/org/apache/solr/search/QueryResult.java:
##
@@ -22,6 +22,7 @@ public class QueryResult {
   // Object for back compatibility so that we render true not "true" in json
   private Object partialResults;
   private Boolean segmentTerminatedEarly;

Review Comment:
   Okay; so the cause is interesting for diagnostic / observability purposes 
but semantically, the search has "terminated early" (for whatever reason).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


sijuv commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2619919386

   @dsmiley @HoustonPutman updated, pls take a look when you get a chance.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1932767430


##
solr/solr-ref-guide/modules/query-guide/pages/common-query-parameters.adoc:
##
@@ -399,6 +399,22 @@ For example, setting `cpuAllowed=500` gives a limit of at 
most 500 ms of CPU tim
 
 All other considerations regarding partial results listed for the 
`timeAllowed` parameter apply here, too.
 
+
+== maxHits Parameter
+
+[%autowidth,frame=none]
+|===
+|Optional |Default: `false`
+|===
+
+This parameter specifies the max number of hits a searcher will iterate 
through before early terminating the search.
+The count is per shard and across all threads involved in case of 
multi-threaded search. This parameter works
+in conjunction with other parameters that could early terminate a search, ex: 
_timeAllowed_ etc. In case the search
+was early terminated due to it exceeding maxHits a _terminatedEarly_ header in 
the response will be set along with
+_partialResults_ to indicate the same
+Note : the hits counted will need not be exactly equal to the maxHits 
provided, however it will be within range of this value.

Review Comment:
   updated documentation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1932766821


##
solr/core/src/java/org/apache/solr/search/QueryResult.java:
##
@@ -22,6 +22,7 @@ public class QueryResult {
   // Object for back compatibility so that we render true not "true" in json
   private Object partialResults;
   private Boolean segmentTerminatedEarly;

Review Comment:
   I think we need to distinguish the 2 because the cause for them is different 
. segmentTerminateEarly could occur because of the searcher not proceeding on a 
sorted segment because the remaning docs are of lower _score_. The 
terminateEarly is purely due to the searcher running past the provided number 
of maxHits. As a user if I see terminateEarly then I know I might need to 
increase the maxHits parameter. 



##
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##
@@ -338,6 +339,11 @@ private Collector buildAndRunCollectorChain(
   if (cmd.isQueryCancellable()) {
 
core.getCancellableQueryTracker().removeCancellableQuery(cmd.getQueryID());
   }
+  if (collector instanceof final EarlyTerminatingCollector 
earlyTerminatingCollector) {

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


sijuv commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2619918214

   > A test is needed.
   
   added tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1932762106


##
solr/core/src/java/org/apache/solr/search/QueryCommand.java:
##
@@ -194,7 +195,7 @@ public QueryCommand setNeedDocSet(boolean needDocSet) {
   }
 
   public boolean getTerminateEarly() {
-return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0;
+return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0 || maxHits < 
Integer.MAX_VALUE;

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1932760465


##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -29,11 +30,15 @@
  */
 public class EarlyTerminatingCollector extends FilterCollector {
 
+  private final int chunkSize = 100; // Check across threads only at a chunk 
size
+
   private final int maxDocsToCollect;
 
   private int numCollected = 0;
   private int prevReaderCumulativeSize = 0;
   private int currentReaderSize = 0;
+  private final AtomicInteger pendingDocsToCollect;

Review Comment:
   Chunking we will still need. LongAdder add does not return the current count 
and doing increment and then get on every document I don't think will be 
optimal.



##
solr/core/src/java/org/apache/solr/search/MultiThreadedSearcher.java:
##
@@ -255,7 +265,11 @@ public Object reduce(Collection collectors) throws 
IOException {
   for (Iterator var4 = collectors.iterator();
   var4.hasNext();
   maxScore = Math.max(maxScore, collector.getMaxScore())) {
-collector = (MaxScoreCollector) var4.next();
+Collector next = (Collector) var4.next();

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


sijuv commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1932759503


##
solr/core/src/java/org/apache/solr/response/SolrQueryResponse.java:
##
@@ -68,6 +68,7 @@ public class SolrQueryResponse {
   public static final String RESPONSE_HEADER_PARTIAL_RESULTS_DETAILS_KEY = 
"partialResultsDetails";
   public static final String RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY =
   "segmentTerminatedEarly";
+  public static final String RESPONSE_HEADER_TERMINATED_EARLY_KEY = 
"terminatedEarly";

Review Comment:
   updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


HoustonPutman commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1932441426


##
solr/core/src/java/org/apache/solr/search/QueryCommand.java:
##
@@ -194,7 +195,7 @@ public QueryCommand setNeedDocSet(boolean needDocSet) {
   }
 
   public boolean getTerminateEarly() {
-return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0;
+return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0 || maxHits < 
Integer.MAX_VALUE;

Review Comment:
   I agree, the name of the param does not really indicate that it could lead 
to partial results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-28 Thread via GitHub


HoustonPutman commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1932439155


##
solr/solr-ref-guide/modules/query-guide/pages/common-query-parameters.adoc:
##
@@ -399,6 +399,22 @@ For example, setting `cpuAllowed=500` gives a limit of at 
most 500 ms of CPU tim
 
 All other considerations regarding partial results listed for the 
`timeAllowed` parameter apply here, too.
 
+
+== maxHits Parameter
+
+[%autowidth,frame=none]
+|===
+|Optional |Default: `false`
+|===
+
+This parameter specifies the max number of hits a searcher will iterate 
through before early terminating the search.
+The count is per shard and across all threads involved in case of 
multi-threaded search. This parameter works
+in conjunction with other parameters that could early terminate a search, ex: 
_timeAllowed_ etc. In case the search
+was early terminated due to it exceeding maxHits a _terminatedEarly_ header in 
the response will be set along with
+_partialResults_ to indicate the same
+Note : the hits counted will need not be exactly equal to the maxHits 
provided, however it will be within range of this value.

Review Comment:
   Can you explain here when the user would see partial results? The user will 
probably not be very familiar with how Lucene or the searcher works, so it 
should be from their point of view.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-24 Thread via GitHub


HoustonPutman commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1929056967


##
solr/core/src/java/org/apache/solr/search/QueryResult.java:
##
@@ -22,6 +22,7 @@ public class QueryResult {
   // Object for back compatibility so that we render true not "true" in json
   private Object partialResults;
   private Boolean segmentTerminatedEarly;

Review Comment:
   Agreed, I'm not sure how people are going to respond differently between the 
two



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-24 Thread via GitHub


HoustonPutman commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1929053426


##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -29,11 +30,15 @@
  */
 public class EarlyTerminatingCollector extends FilterCollector {
 
+  private final int chunkSize = 100; // Check across threads only at a chunk 
size
+
   private final int maxDocsToCollect;
 
   private int numCollected = 0;
   private int prevReaderCumulativeSize = 0;
   private int currentReaderSize = 0;
+  private final AtomicInteger pendingDocsToCollect;

Review Comment:
   Agreed, it looks like the chunking can go away if we switch to a LongAdder. 
Much like the chunking implementation, we won't get absolute accuracy across 
threads, but we will get higher throughput. (And we could sacrifice throughput 
to increase accuracy if we wanted to, but I doubt we do)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-23 Thread via GitHub


dsmiley commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2609890945

   A test is needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-22 Thread via GitHub


dsmiley commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1926442325


##
solr/core/src/java/org/apache/solr/response/SolrQueryResponse.java:
##
@@ -68,6 +68,7 @@ public class SolrQueryResponse {
   public static final String RESPONSE_HEADER_PARTIAL_RESULTS_DETAILS_KEY = 
"partialResultsDetails";
   public static final String RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY =
   "segmentTerminatedEarly";
+  public static final String RESPONSE_HEADER_TERMINATED_EARLY_KEY = 
"terminatedEarly";

Review Comment:
   Okay you did in the ref guide... still a simple bit of javadoc here would be 
helpful



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-22 Thread via GitHub


dsmiley commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1926430030


##
solr/core/src/java/org/apache/solr/search/QueryResult.java:
##
@@ -22,6 +22,7 @@ public class QueryResult {
   // Object for back compatibility so that we render true not "true" in json
   private Object partialResults;
   private Boolean segmentTerminatedEarly;

Review Comment:
   thinking out loud:  Perhaps we should do away with segmentTerminatedEarly as 
overly specific



##
solr/core/src/java/org/apache/solr/search/QueryCommand.java:
##
@@ -194,7 +195,7 @@ public QueryCommand setNeedDocSet(boolean needDocSet) {
   }
 
   public boolean getTerminateEarly() {
-return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0;
+return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0 || maxHits < 
Integer.MAX_VALUE;

Review Comment:
   When I see these other constants about TERMINATE_EARLY, this is what leads 
me to think that maxHitsTerminateEarly is a good name.



##
solr/core/src/java/org/apache/solr/response/SolrQueryResponse.java:
##
@@ -68,6 +68,7 @@ public class SolrQueryResponse {
   public static final String RESPONSE_HEADER_PARTIAL_RESULTS_DETAILS_KEY = 
"partialResultsDetails";
   public static final String RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY =
   "segmentTerminatedEarly";
+  public static final String RESPONSE_HEADER_TERMINATED_EARLY_KEY = 
"terminatedEarly";

Review Comment:
   Somewhere we need documentation to differentiate the meaning of this vs 
partialResults.  To me, this _implies_ partialResults at the least and it may 
be redundant.



##
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##
@@ -29,11 +30,15 @@
  */
 public class EarlyTerminatingCollector extends FilterCollector {
 
+  private final int chunkSize = 100; // Check across threads only at a chunk 
size
+
   private final int maxDocsToCollect;
 
   private int numCollected = 0;
   private int prevReaderCumulativeSize = 0;
   private int currentReaderSize = 0;
+  private final AtomicInteger pendingDocsToCollect;

Review Comment:
   LongAdder is likely a better choice for this use-case



##
solr/core/src/java/org/apache/solr/search/MultiThreadedSearcher.java:
##
@@ -255,7 +265,11 @@ public Object reduce(Collection collectors) throws 
IOException {
   for (Iterator var4 = collectors.iterator();
   var4.hasNext();
   maxScore = Math.max(maxScore, collector.getMaxScore())) {
-collector = (MaxScoreCollector) var4.next();
+Collector next = (Collector) var4.next();

Review Comment:
   could use a comment like "assume the next collector, except 
EarlyTerminating, is the MaxScoreCollector".
   
   While you are changing this code, please rename `var4` which is an 
inexcusable variable name



##
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##
@@ -338,6 +339,11 @@ private Collector buildAndRunCollectorChain(
   if (cmd.isQueryCancellable()) {
 
core.getCancellableQueryTracker().removeCancellableQuery(cmd.getQueryID());
   }
+  if (collector instanceof final EarlyTerminatingCollector 
earlyTerminatingCollector) {

Review Comment:
   this check is brittle since it can only possibly true if the *last* 
collector is earlyTerminatingCollector.  We may add others or re-order at some 
point.  Also, can we just depend on the exception above and set 
`qr.setTerminatedEarly(true);` there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-22 Thread via GitHub


dsmiley commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2608897224

   note: you force-pushed to this PR.  Please don't do that; it resets the 
review state making it hard for me to see changes since my last review and if I 
try it's still possible an earlier commit was changed and I won't know it.  So 
never force-push to a PR that has been reviewed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-22 Thread via GitHub


sijuv commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2608004442

   @dsmiley can you pls relook when you get a chance ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17447 : Support to early terminate a search based on maxHits per collector. [solr]

2025-01-09 Thread via GitHub


sijuv commented on PR #2960:
URL: https://github.com/apache/solr/pull/2960#issuecomment-2581070892

   > Overall this makes sense to me. Thanks for the nice contribution! I'd 
prefer if @gus-asf could take a look at some aspects since he worked on 
cpuAllowed recently. BTW it's clear you need to run `./gradlew tidy`
   Addressed.
   > 
   > I suggest renaming the param "maxHitsPerShard" to simply "maxHits" or 
"maxHitsTerminateEarly" and document that it's per-shard and best-effort; that 
more hits may ultimately be detected in aggregate. But maybe others disagree.
   > 
   Renamed to maxHits
   > It'd be good to consider interference with other features. Maybe it works 
with cursorMark? Does it count _after_ PostFilter (e.g. CollapseQParser) or 
before?
   Sorry, I am not aware of cursorMark. This will count while the collector 
runs over the posting list so it is not a post filter. Not sure if I understood 
your question correctly,
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org