[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter -- fl function query

2025-06-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986923#comment-17986923
 ] 

ASF subversion and git services commented on SOLR-17775:


Commit e160aeb65d8fb4a9a50d82a089dfd5e25673a939 in solr's branch 
refs/heads/fix-native-access-warning from David Smiley
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=e160aeb65d8 ]

SOLR-17775: avoid over-calling ReaderUtil.subIndex (#3386)



> Optimize ValueSourceAugmenter -- fl function query
> --
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.9
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter -- fl function query

2025-06-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17983795#comment-17983795
 ] 

ASF subversion and git services commented on SOLR-17775:


Commit f64acc0f2686d71ac31532154a037f204206430a in solr's branch 
refs/heads/branch_9x from David Smiley
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=f64acc0f268 ]

SOLR-17775: avoid over-calling ReaderUtil.subIndex (#3386)

(cherry picked from commit e160aeb65d8fb4a9a50d82a089dfd5e25673a939)


> Optimize ValueSourceAugmenter -- fl function query
> --
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.9
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter -- fl function query

2025-06-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17983753#comment-17983753
 ] 

ASF subversion and git services commented on SOLR-17775:


Commit e160aeb65d8fb4a9a50d82a089dfd5e25673a939 in solr's branch 
refs/heads/main from David Smiley
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=e160aeb65d8 ]

SOLR-17775: avoid over-calling ReaderUtil.subIndex (#3386)



> Optimize ValueSourceAugmenter -- fl function query
> --
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.9
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956481#comment-17956481
 ] 

ASF subversion and git services commented on SOLR-17775:


Commit 08f7eda826c74a1d00679649f102202a6037d2e5 in solr's branch 
refs/heads/branch_9x from yura
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=08f7eda826c ]

SOLR-17775: Speed up function queries in 'fl' param. (#3380)

By fetching the data in doc order (mechanical sympathy) up to a cached threshold

(cherry picked from commit 51b315ae7c570782b0d46cfb356d19f3d34d4fa5)


> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956474#comment-17956474
 ] 

ASF subversion and git services commented on SOLR-17775:


Commit 51b315ae7c570782b0d46cfb356d19f3d34d4fa5 in solr's branch 
refs/heads/main from yura
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=51b315ae7c5 ]

SOLR-17775: Speed up function queries in 'fl' param. (#3380)

By fetching the data in doc order (mechanical sympathy) up to a cached threshold

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-03 Thread Yura (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956006#comment-17956006
 ] 

Yura commented on SOLR-17775:
-

I agree it could yield greater benefit, but I don’t want to expand this change 
to DocStreamer refactoring. Sounds good—I’ll add a 1,000-doc cap.

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-03 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956005#comment-17956005
 ] 

David Smiley commented on SOLR-17775:
-

While I maintain my opinion that DocStreamer casts a wider net with greater 
value, I'm on board with continuing with your existing work here –  "progress 
not perfection".  I think doing only the first X (1000) is an easy way to cap 
the memory risk, leaving a later time for improving to a batch model.

 

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-03 Thread Yura (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955999#comment-17955999
 ] 

Yura commented on SOLR-17775:
-

I could add a cap on the maximum cached docs if needed—prefetch only the first 
1,000 into the IntObjectMap; beyond that, fall back to per-document retrieval.

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-03 Thread Yura (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955998#comment-17955998
 ] 

Yura commented on SOLR-17775:
-

I think changing DocsStreamer would be much more involved and use far more 
memory, since each document is a LinkedHashMap. This patch is a surgical 
optimization for ValueSource-based fields and uses a compact 
IntObjectHashMap—likely no larger than the DocSlice itself. The coordinator 
likely already materializes all documents (plus whatever JSON/XML serializers 
hold), so the extra footprint is minimal. 

 

I’m not sure if classes like Lucene90CompressingStoredFieldsReader are 
optimized for strictly in-order reads—some code still seeks from 0, so it may 
not save work without further changes.

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-03 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955911#comment-17955911
 ] 

David Smiley commented on SOLR-17775:
-

By recommending modifying DocStreamer, I don't imply leaving 
ValueSourceAugmenter as-is, as it could cache the last getValues call and 
related metadata if the next doc to retrieve is in the same segment.

Fetching stored fields in Lucene isn't an array lookup!  It would benefit from 
the doc-order approach too!

Batching by 1000 (configurable) would seem to be a nice balance of capping 
memory needs and benefiting from this nice optimization.

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-03 Thread Yura (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955909#comment-17955909
 ] 

Yura commented on SOLR-17775:
-

I’m not sure that modifying DocStreamer afterward would yield significant 
gains—we’d need to profile first. Also, I believe regular (non‐query) field 
retrieval uses simple array access rather than skip lists, so sorting and 
chunking probably won’t improve performance further. And I think implementing 
that change would also be more complex.

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-02 Thread Yura (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955820#comment-17955820
 ] 

Yura commented on SOLR-17775:
-

[~dsmiley], the improvement was quite significant. This is especially 
noticeable if you need more than 100 rows. I don’t think it adds much memory 
usage. It’s only for the retrieved document IDs, which are usually quite small 
(<1000).

The main gain is not in saving calls to {{{}getValues{}}}, but in retrieving 
values in order. Lucene data structures are optimized for iteration, not random 
seek. Internally, this approach uses {{DocIterator}} jumps from doc[N-1] to 
doc[N], instead of jumping from 0 to doc[N]. This is much cheaper.

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

2025-06-01 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955566#comment-17955566
 ] 

David Smiley commented on SOLR-17775:
-

In practice, how much better are you finding this optimization?  I would not 
have guessed it to be considerable.  I suppose this saves the 
ValueSource.getValues call so that it's per segment instead of per doc.  Note 
that caching this will use up some memory... albeit maybe not too much if a 
ValueSource is just loading a number, say.

Perhaps it would make more sense for Solr to process the documents (and thus 
all DocTransformers and also field value retrieval) in doc ID order before then 
sorting them in the desired order?  See DocsStreamer.

I could see doing this with a chunking strategy to cap risks of using too much 
memory.

> Optimize ValueSourceAugmenter
> -
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yura
>Priority: Minor
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]