I am using Lucene 8.2, but have also verified this on 8.9. My query string is either ""by~1 word~1"", or ""ky~1 word~1"".
I am looking for a phrase of these 2 words, with potential 1 character misspelling, or fuzziness. I realize that 'by' is usually a stop word, that is why I also tested with 'ky'. My simplified test content is either "AC-2.b word", "AC-2.k word", "AC-2.y word". The first part of the test content is pulled from actual data my customers are trying to search. For the query with 'by~1' the exception occurs if the content has '.b' or .y', but not '.k' For the query with 'ky~1' the exception occurs if the content has '.k' or .y', but not '.b' Here is the test code: import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.*; import org.apache.lucene.analysis.standard.*; import org.apache.lucene.analysis.tokenattributes.*; import org.apache.lucene.analysis.util.*; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.FieldType; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexOptions; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.queryparser.classic.ParseException; import org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.RAMDirectory; public class phraseTest { public static Analyzer analyzer = new StandardAnalyzer(); public static IndexWriterConfig config = new IndexWriterConfig( analyzer); public static RAMDirectory ramDirectory = new RAMDirectory(); public static IndexWriter indexWriter; public static Query queryToSearch = null; public static IndexReader idxReader; public static IndexSearcher idxSearcher; public static TopDocs hits; public static String query_field = "Content"; // Pick only one content string // public static String content = "AC-2.b word"; public static String content = "AC-2.k word"; // public static String content = "AC-2.y word"; // Pick only one query string // public static String queryString = "\"by~1 word~1\""; public static String queryString = "\"ky~1 word~1\""; @SuppressWarnings("deprecation") public static void main(String[] args) throws IOException { System.out.println("Content is\n " + content); System.out.println("Query field is " + query_field); System.out.println("Query String is '" + queryString + "'"); Document doc = new Document(); // create a new document /** * Create a field with term vector enabled */ FieldType type = new FieldType(); type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); type.setStored(true); type.setStoreTermVectors(true); type.setTokenized(true); type.setStoreTermVectorOffsets(true); //term vector enabled Field cField = new Field(query_field, content, type); doc.add(cField); try { indexWriter = new IndexWriter(ramDirectory, config); indexWriter.addDocument(doc); indexWriter.close(); idxReader = DirectoryReader.open(ramDirectory); idxSearcher = new IndexSearcher(idxReader); ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(query_field, analyzer); queryToSearch = qp.parse(queryString); // Here is where the searching, etc starts hits = idxSearcher.search(queryToSearch, idxReader.maxDoc()); System.out.println("scoreDoc size: " + hits.scoreDocs.length); // highlight the hits ... } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } Here is the exception (using Lucene 8.2): Exception in thread "main" java.lang.IllegalArgumentException: Unknown query type "org.apache.lucene.search.ConstantScoreQuery" found in phrase query string "ky~1 word~1" at org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:325) at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:666) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:439) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:564) at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:416) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:427) at phraseTest.main(phraseTest.java:79)` Am I using ComplexPhraseQueryParser wrong? Is this a bug in Lucene? I have also tested this with a query string like ""dog~2 word~1"". This causes the same exception if the content has ‘.d’, ‘.o’, or ‘.g’. Looks like a fuzzy term that reduces to 1 character runs into trouble when encountering a matching single character term in the content. Thanks in advance for any suggestions, or guidance, David Shifflett