Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-15 Thread sanjay dutt
I opened an issue for this one ( https://github.com/apache/lucene/issues/13373). Please feel free to edit or add more info to it. Regards, Sanjay On Wed, May 15, 2024 at 8:07 PM Michael McCandless < luc...@mikemccandless.com> wrote: > Thanks Jeven, more response inlined below: > > On Tue, May 14

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-15 Thread Michael McCandless
Thanks Jeven, more response inlined below: On Tue, May 14, 2024 at 12:58 PM Jerven Tjalling Bolleman wrote: The index that had an issue when merging into one segment definitely had > more than 1 billion times the word "positional" in it. I hope to be able > to give a closer number once re-indexi

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-14 Thread Jerven Tjalling Bolleman
Hi Michael, The index that had an issue when merging into one segment definitely had more than 1 billion times the word "positional" in it. I hope to be able to give a closer number once re-indexing finished with a "work-around". Of course the "work-around" is to just fix this correctly by not

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-14 Thread Michael McCandless
I think we should at least open an issue to try to improve the exception message? We might catch the exception higher up (where we know the field name) and rethrow with the field name, maybe. We can discuss options on the issue ... If you are not using custom term frequencies it's not clear to m

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Michael Sokolov
This is definitely a confusing error condition. If we can add more information without creating an undue burden for the indexer it would be nice, but I think this will be very challenging here since the exception is thrown at a low level in the code where there might not be a lot of useful info (ie

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman
Dear Michael, Looking deeper into this. I think we overflowed a term frequency field. Looking in some statistics, in a previous release we had 1,288,526,281 of a certain field, this would be larger now. Each of these would have had a limited set of values. But crucially nearly all of them woul

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman
Dear Michael, Thank you for your help. We don't use custom term frequencies (I just double checked with a code search). We also always merge down to one segment (historical but also we index once and then there are no changes for a week to a month and then we reindex every document from scrat

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Michael Sokolov
It seems as if the term frequency for some term exceeded the maximum. This can happen if you supplied custom term frequencies eg with https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true . The behavior didn't change since