[ 
https://issues.apache.org/jira/browse/LUCENE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chongchen Chen updated LUCENE-8985:
-----------------------------------
    Comment: was deleted

(was: [~janhoy] I find two testcases in TestSynonymGraphFilter that I cannot 
understand.

{code:java}
public void testBasicKeepOrigTwoOutputs() throws Exception {
    SynonymMap.Builder b = new SynonymMap.Builder();
    add(b, "a b", "x y", true);
    add(b, "a b", "m n o", true);

    Analyzer a = getAnalyzer(b, true);
    assertAnalyzesTo(a,
                     "c a b d",
                     new String[] {"c", "x", "m", "a", "y", "n", "o", "b", "d"},
                     new int[]    { 0,   2,   2,   2,   2,   2,   2,   4,   6},
                     new int[]    { 1,   5,   5,   3,   5,   5,   5,   5,   7},
                     new String[] {"word", "SYNONYM", "SYNONYM", "word", 
"SYNONYM", "SYNONYM", "SYNONYM", "word", "word"},
                     new int[]    { 1,   1,   0,   0,   1,   1,   1,   1,   1},
                     new int[]    { 1,   1,   2,   4,   4,   1,   2,   1,   
1}); // I think posLengths should be {1, 1, 1, 1, 2, 1, 1, 2, 1} . because the 
longest synonym's length is 3
    a.close();
  }

public void testBasicNoKeepOrigTwoOutputs() throws Exception {
    SynonymMap.Builder b = new SynonymMap.Builder();
    add(b, "a b", "x y", false);
    add(b, "a b", "m n o", false);

    Analyzer a = getAnalyzer(b, true);
    assertAnalyzesTo(a,
                     "c a b d",
                     new String[] {"c", "x", "m", "y", "n", "o", "d"},
                     new int[]    { 0,   2,   2,   2,   2,   2,   6},
                     new int[]    { 1,   5,   5,   5,   5,   5,   7},
                     new String[] {"word", "SYNONYM", "SYNONYM", "SYNONYM", 
"SYNONYM", "SYNONYM", "word"},
                     new int[]    { 1,   1,   0,   1,   1,   1,   1},
                     new int[]    { 1,   1,   2,   3,   1,   1,   1}); // I 
think posLengths should be {1, 1, 1, 2, 1, 1, 1}.  because the longest 
synonym's length is 3
    a.close();
  }
{code}
Why PosLengths are those numbers? Do I misunderstand something?


)

> SynonymGraphFilter cannot handle input stream with tokens filtered.
> -------------------------------------------------------------------
>
>                 Key: LUCENE-8985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8985
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Chongchen Chen
>            Assignee: Jan Høydahl
>            Priority: Major
>             Fix For: 8.3
>
>         Attachments: SGF_SF_interaction.patch.txt
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [~janhoy] find the bug.
> In an analyzer with e.g. stopFilter where tokens are removed from the stream 
> and replaced with a “hole”, synonymgraphfilter will not preserve these holes 
> but remove them, resulting in certain phrase queries failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to