[jira] [Commented] (LUCENE-7824) Multi-word synonyms rule with common terms at the same position are buggy
[ https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009823#comment-16009823 ] ASF subversion and git services commented on LUCENE-7824: - Commit 55bad6fec3c984d4ef56f94f0f50b9f1b2e6dba3 in lucene-solr's branch refs/heads/branch_6_6 from [~jimczi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=55bad6f ] LUCENE-7824: Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city). > Multi-word synonyms rule with common terms at the same position are buggy > - > > Key: LUCENE-7824 > URL: https://issues.apache.org/jira/browse/LUCENE-7824 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: master (7.0), 6.5.1 >Reporter: Jim Ferenczi > Fix For: master (7.0), 6.6 > > Attachments: LUCENE-7824.patch > > > The automaton built from the graph token stream tries to pack common terms in > multi word synonyms that appear at the same position. This means that some > states inside a multi word synonym can have multiple transitions. > As a result the intersection point of the graph are not computed correctly. > For example the synonym rule: "ny, new york city, new york" is not applied > correctly to the query "ny police". > In this case "police" is detected as part of the multi synonyms path and we > create the disjunction between: > "ny police", "new york police", ... > I pushed a patch that removes this optim (and creates a single transition > from each state) in order to ensure that the intersection points of the graph > always showed up at the end of the multi synonym paths. > [~mattweber] can you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7824) Multi-word synonyms rule with common terms at the same position are buggy
[ https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009822#comment-16009822 ] ASF subversion and git services commented on LUCENE-7824: - Commit 84b8b5a1d895ba2fa2d7fbad8cd4ea50321e0dd3 in lucene-solr's branch refs/heads/branch_6x from [~jimczi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=84b8b5a ] LUCENE-7824: Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city). > Multi-word synonyms rule with common terms at the same position are buggy > - > > Key: LUCENE-7824 > URL: https://issues.apache.org/jira/browse/LUCENE-7824 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: master (7.0), 6.5.1 >Reporter: Jim Ferenczi > Fix For: master (7.0), 6.6 > > Attachments: LUCENE-7824.patch > > > The automaton built from the graph token stream tries to pack common terms in > multi word synonyms that appear at the same position. This means that some > states inside a multi word synonym can have multiple transitions. > As a result the intersection point of the graph are not computed correctly. > For example the synonym rule: "ny, new york city, new york" is not applied > correctly to the query "ny police". > In this case "police" is detected as part of the multi synonyms path and we > create the disjunction between: > "ny police", "new york police", ... > I pushed a patch that removes this optim (and creates a single transition > from each state) in order to ensure that the intersection points of the graph > always showed up at the end of the multi synonym paths. > [~mattweber] can you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7824) Multi-word synonyms rule with common terms at the same position are buggy
[ https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009821#comment-16009821 ] ASF subversion and git services commented on LUCENE-7824: - Commit 21362a3ba4c1e936416635667f257b36235b00ab in lucene-solr's branch refs/heads/master from [~jimczi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=21362a3 ] LUCENE-7824: Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city). > Multi-word synonyms rule with common terms at the same position are buggy > - > > Key: LUCENE-7824 > URL: https://issues.apache.org/jira/browse/LUCENE-7824 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: master (7.0), 6.5.1 >Reporter: Jim Ferenczi > Fix For: master (7.0), 6.6 > > Attachments: LUCENE-7824.patch > > > The automaton built from the graph token stream tries to pack common terms in > multi word synonyms that appear at the same position. This means that some > states inside a multi word synonym can have multiple transitions. > As a result the intersection point of the graph are not computed correctly. > For example the synonym rule: "ny, new york city, new york" is not applied > correctly to the query "ny police". > In this case "police" is detected as part of the multi synonyms path and we > create the disjunction between: > "ny police", "new york police", ... > I pushed a patch that removes this optim (and creates a single transition > from each state) in order to ensure that the intersection points of the graph > always showed up at the end of the multi synonym paths. > [~mattweber] can you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7824) Multi-word synonyms rule with common terms at the same position are buggy
[ https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008311#comment-16008311 ] Matt Weber commented on LUCENE-7824: Sure, looks good then! > Multi-word synonyms rule with common terms at the same position are buggy > - > > Key: LUCENE-7824 > URL: https://issues.apache.org/jira/browse/LUCENE-7824 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7824.patch > > > The automaton built from the graph token stream tries to pack common terms in > multi word synonyms that appear at the same position. This means that some > states inside a multi word synonym can have multiple transitions. > As a result the intersection point of the graph are not computed correctly. > For example the synonym rule: "ny, new york city, new york" is not applied > correctly to the query "ny police". > In this case "police" is detected as part of the multi synonyms path and we > create the disjunction between: > "ny police", "new york police", ... > I pushed a patch that removes this optim (and creates a single transition > from each state) in order to ensure that the intersection points of the graph > always showed up at the end of the multi synonym paths. > [~mattweber] can you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7824) Multi-word synonyms rule with common terms at the same position are buggy
[ https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008305#comment-16008305 ] Jim Ferenczi commented on LUCENE-7824: -- I don't think we should try to optimize here. The number of terms should be small in a query so I would prefer to keep it simple and just create a new entry for each token like the cached token stream does. > Multi-word synonyms rule with common terms at the same position are buggy > - > > Key: LUCENE-7824 > URL: https://issues.apache.org/jira/browse/LUCENE-7824 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7824.patch > > > The automaton built from the graph token stream tries to pack common terms in > multi word synonyms that appear at the same position. This means that some > states inside a multi word synonym can have multiple transitions. > As a result the intersection point of the graph are not computed correctly. > For example the synonym rule: "ny, new york city, new york" is not applied > correctly to the query "ny police". > In this case "police" is detected as part of the multi synonyms path and we > create the disjunction between: > "ny police", "new york police", ... > I pushed a patch that removes this optim (and creates a single transition > from each state) in order to ensure that the intersection points of the graph > always showed up at the end of the multi synonym paths. > [~mattweber] can you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7824) Multi-word synonyms rule with common terms at the same position are buggy
[ https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008209#comment-16008209 ] Matt Weber commented on LUCENE-7824: [~jim.ferenczi] Maybe use a {{BytesRefHash}} and maintain a id-to-hash map so we still only have single copy of common term in memory and still have a unique id? > Multi-word synonyms rule with common terms at the same position are buggy > - > > Key: LUCENE-7824 > URL: https://issues.apache.org/jira/browse/LUCENE-7824 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7824.patch > > > The automaton built from the graph token stream tries to pack common terms in > multi word synonyms that appear at the same position. This means that some > states inside a multi word synonym can have multiple transitions. > As a result the intersection point of the graph are not computed correctly. > For example the synonym rule: "ny, new york city, new york" is not applied > correctly to the query "ny police". > In this case "police" is detected as part of the multi synonyms path and we > create the disjunction between: > "ny police", "new york police", ... > I pushed a patch that removes this optim (and creates a single transition > from each state) in order to ensure that the intersection points of the graph > always showed up at the end of the multi synonym paths. > [~mattweber] can you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org