Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Michael Sokolov
This is definitely a confusing error condition. If we can add more information without creating an undue burden for the indexer it would be nice, but I think this will be very challenging here since the exception is thrown at a low level in the code where there might not be a lot of useful info (ie

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman
Dear Michael, Looking deeper into this. I think we overflowed a term frequency field. Looking in some statistics, in a previous release we had 1,288,526,281 of a certain field, this would be larger now. Each of these would have had a limited set of values. But crucially nearly all of them woul

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman
Dear Michael, Thank you for your help. We don't use custom term frequencies (I just double checked with a code search). We also always merge down to one segment (historical but also we index once and then there are no changes for a week to a month and then we reindex every document from scrat

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Michael Sokolov
It seems as if the term frequency for some term exceeded the maximum. This can happen if you supplied custom term frequencies eg with https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true . The behavior didn't change since

ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman
Dear Lucene community, This morning I found this exception in our logs. This was the first time we indexed this data with lucene 9.10. Before we were still on the lucene 8.x branch. between the last indexing with 8 and this one with 9.10 we have a bit more data so it could be something else th