[ https://issues.apache.org/jira/browse/LUCENE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chongchen Chen updated LUCENE-8985: ----------------------------------- Comment: was deleted (was: [~janhoy] I find two testcases in TestSynonymGraphFilter that I cannot understand. {code:java} public void testBasicKeepOrigTwoOutputs() throws Exception { SynonymMap.Builder b = new SynonymMap.Builder(); add(b, "a b", "x y", true); add(b, "a b", "m n o", true); Analyzer a = getAnalyzer(b, true); assertAnalyzesTo(a, "c a b d", new String[] {"c", "x", "m", "a", "y", "n", "o", "b", "d"}, new int[] { 0, 2, 2, 2, 2, 2, 2, 4, 6}, new int[] { 1, 5, 5, 3, 5, 5, 5, 5, 7}, new String[] {"word", "SYNONYM", "SYNONYM", "word", "SYNONYM", "SYNONYM", "SYNONYM", "word", "word"}, new int[] { 1, 1, 0, 0, 1, 1, 1, 1, 1}, new int[] { 1, 1, 2, 4, 4, 1, 2, 1, 1}); // I think posLengths should be {1, 1, 1, 1, 2, 1, 1, 2, 1} . because the longest synonym's length is 3 a.close(); } public void testBasicNoKeepOrigTwoOutputs() throws Exception { SynonymMap.Builder b = new SynonymMap.Builder(); add(b, "a b", "x y", false); add(b, "a b", "m n o", false); Analyzer a = getAnalyzer(b, true); assertAnalyzesTo(a, "c a b d", new String[] {"c", "x", "m", "y", "n", "o", "d"}, new int[] { 0, 2, 2, 2, 2, 2, 6}, new int[] { 1, 5, 5, 5, 5, 5, 7}, new String[] {"word", "SYNONYM", "SYNONYM", "SYNONYM", "SYNONYM", "SYNONYM", "word"}, new int[] { 1, 1, 0, 1, 1, 1, 1}, new int[] { 1, 1, 2, 3, 1, 1, 1}); // I think posLengths should be {1, 1, 1, 2, 1, 1, 1}. because the longest synonym's length is 3 a.close(); } {code} Why PosLengths are those numbers? Do I misunderstand something? ) > SynonymGraphFilter cannot handle input stream with tokens filtered. > ------------------------------------------------------------------- > > Key: LUCENE-8985 > URL: https://issues.apache.org/jira/browse/LUCENE-8985 > Project: Lucene - Core > Issue Type: Bug > Reporter: Chongchen Chen > Assignee: Jan Høydahl > Priority: Major > Fix For: 8.3 > > Attachments: SGF_SF_interaction.patch.txt > > Time Spent: 2h 20m > Remaining Estimate: 0h > > [~janhoy] find the bug. > In an analyzer with e.g. stopFilter where tokens are removed from the stream > and replaced with a “hole”, synonymgraphfilter will not preserve these holes > but remove them, resulting in certain phrase queries failing. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org