The update to Lucene 9.4 has gone in.
The use of StandardAnalyzer with no configured stop words will be Lucene's new behaviour (no stop words) rather the the 8.x and earlier behaviour which had a default of some English stop words.

Please do try this out - the development builds from 363 onwards incorporate this.

   Andy

https://github.com/apache/jena/pull/1582


On 29/10/2022 21:24, Andy Seaborne wrote:
An upgrade from Lucene 8.11.1 to 9.4.0 has been suggested.

https://github.com/apache/jena/issues/1581
https://github.com/apache/jena/pull/1582/files

This is not a completely transparent upgrade.

"""
English stopwords are no longer removed by default in StandardAnalyzer (LUCENE-7444)
"""
https://issues.apache.org/jira/browse/LUCENE-7444
https://github.com/apache/lucene/issues/8496

and Jena creates a default StandardAnalyzer if there are no stop works in the assembler.

What does the community want to do - switch to no stop words by default as per standard Lucene or provide the English list?

       Andy

Full migration notes:
https://lucene.apache.org/core/9_4_0/MIGRATE.html

Reply via email to