The update to Lucene 9.4 has gone in.
The use of StandardAnalyzer with no configured stop words will be
Lucene's new behaviour (no stop words) rather the the 8.x and earlier
behaviour which had a default of some English stop words.
Please do try this out - the development builds from 363 onwards
incorporate this.
Andy
https://github.com/apache/jena/pull/1582
On 29/10/2022 21:24, Andy Seaborne wrote:
An upgrade from Lucene 8.11.1 to 9.4.0 has been suggested.
https://github.com/apache/jena/issues/1581
https://github.com/apache/jena/pull/1582/files
This is not a completely transparent upgrade.
"""
English stopwords are no longer removed by default in StandardAnalyzer
(LUCENE-7444)
"""
https://issues.apache.org/jira/browse/LUCENE-7444
https://github.com/apache/lucene/issues/8496
and Jena creates a default StandardAnalyzer if there are no stop works
in the assembler.
What does the community want to do - switch to no stop words by default
as per standard Lucene or provide the English list?
Andy
Full migration notes:
https://lucene.apache.org/core/9_4_0/MIGRATE.html