[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-09-03 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602117#comment-16602117
 ] 

Alan Woodward commented on LUCENE-8196:
---

[~Martin Hermann] seeing as this ticket is closed, I opened LUCENE-8477 to 
continue discussion.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-08-30 Thread Martin Hermann (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597355#comment-16597355
 ] 

Martin Hermann commented on LUCENE-8196:


[~romseygeek]

1) I agree that this might be a solution, but as it differs from the setting of 
the paper should be done very carefully.
 
2) Internal slop seems like a great idea! You're right, my example wasn't very 
good and {{Intervals.phrase()}} already does that. But still, if you think of a 
bigger query and e.g. one slop (say, {{"a ("big bad" OR evil) wolf", one 
additional token allowed somewhere}}), the problem remains. I don't really see 
how 'internal slop' would differ from 'normal slop' (doesn't it measure the 
exact same thing?), but it seems rather easy to implement and like something 
that would be desirable and solve this issue.
 
3) I'm not quite sure if I understand that correctly. Do you mean using a gap 
in the query and rewrite it to something like
{noformat}
"bad wolf" (slop 1) contained by "big GAP wolf" (slop 2)
{noformat}
or adding the gap automatically somewhere down the way? I think in the first 
case it'd still be possible to construct some (maybe a little bit more 
complicated) examples that can't be solved like that and where the minimal 
intervals behaviour does not match intuition.

Again, while a lot of these queries may seem quite exotic, I think that 
intervals will get used a lot various programmatically generated queries (as 
spans do now), and there pretty much anything can happen.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-08-29 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596348#comment-16596348
 ] 

Alan Woodward commented on LUCENE-8196:
---

Hi [~Martin Hermann]

Thanks for the detailed feedback - this is very helpful!

1) As with Spans, one way to fix the issue with OR intervals is to change the 
precedence rules so that longer intervals sort before their prefixes.  I need 
to go re-read the paper's proof concerning the OR operator, it would be 
interesting to see if this ends up causing problems elsewhere .  Another option 
would be to add a separate IntervalsSource with this behaviour, maybe triggered 
as a parameter on {{Intervals.or()}}

2) Intervals don't really have the notion of 'slop' that Spans do, but we could 
add the idea of an 'internal slop' to ordered and unordered spans.  This would 
be measured as the space within an interval not taken up by the component 
intervals.  Your {{("big bad" OR evil) wolf}} query I think can already be done 
using {{Intervals.phrase()}}?

3) Spans have the notion of a 'gap' Span, which could be usefully added here.  
This could help with avoiding minimization in your CONTAINS query

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-08-28 Thread Martin Hermann (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595002#comment-16595002
 ] 

Martin Hermann commented on LUCENE-8196:


First of all, I really like this implementation and the ideas that went into 
it. But as I have spent quite some time with the old span queries and their 
problems, I'd like to comment on some things and maybe offer some fresh view 
points for old problems:
 
 
Obviously, maxwidth is not completely identical to specifying slop: Let's say 
we want to do some sort of synonym expansion and query for "("big bad" OR evil) 
wolf" (this is of course related to the prefix-problem we already know about 
("genome editing"), but I think still slightly different).
With span queries, this would have been possible, as we just have to set slop 
to 0 in all queries, but now we have to do something like
{code:java}
Intervals.maxwidth(3,
    Intervals.ordered(
    Intervals.or(
    Intervals.maxwidth(2,
    
Intervals.ordered(Intervals.term("big"),Intervals.term("bad"))),
    Intervals.term("evil")),
    Intervals.term("wolf")));{code}
which also matches "evil eyes wolf", which should not be a match. It would be 
possible to rewrite the query so that the disjunction is at the top level, 
something like
{code:java}
Intervals.or(
    Intervals.maxwidth(2,
    Intervals.ordered(Intervals.term("evil"),Intervals.term("wolf"))),
    Intervals.maxwidth(3,
    
Intervals.ordreed(Intervals.term("big"),Intervals.term("bad"),Intervals.term("wolf";{code}

which would work as expected, but I think we can agree that this is not really 
a nice solution (but I will come back to it later).

 

Now, we already know that "(big OR "big bad") wolf" would not match "big bad 
wolf" (this is exactly the genome editing thing), but I think it is worth to 
point out exactly why: It actually should not match, according to the 
definition of "minimum interval": Any match for "big bad" is also a match for 
big, so the first IntervalsSource only passes matches for "big", and then we 
get no match for "big wolf". This is a feature of the query semantic of the 
paper (and maybe the reason for the efficency and simplicity of the 
algorithms): The problems that spanQueries had are gone, because we define the 
unexpected behaviour to be correct*. As much as I like the IntervalQueries, I 
do not really think this is satisfactory.

 
There are actually other, similar cases with containing/containedBy: Let's say 
our document is "big bad big wolf" and we want "bad wolf" (slop 1) to be 
contained by "big wolf" (slop 2). We would get no match in this document, as 
the minimal match for the big interval is just "big wolf" (as the other match, 
"big bad big wolf" contains this one). At least to me this is counter intuitive 
and I would expect the document to match.
It really gets strange if we mix in some "OR":
{noformat}
"big wolf" (slop 1) contained in ("big wolf" (slop 1) OR "bad wolf") {noformat}
does not match "big bad wolf", in contrast to
{noformat}
"big wolf" (slop 1) contained in ("big wolf" (slop 1)){noformat}
, which does. So we actually lose a match by adding a OR-clause, and I think we 
can agree that this is not really good. Of course these are not queries a human 
would write, but I think one major use case of span queries is some sort of 
automatic query generation, and that's where I think it is really important to 
meet at least some basic expectations (such as not losing matches by adding 
disjunctions).
 
I don't see a way to fix this that still follows minimal interval semantics, as 
all this is actually how it SHOULD work there, but this would mean we'd lose 
the correctness proofs. The only thing I can think of is some sort of query 
rewriting, pushing the disjunction as far top as neccessary, but this may be 
rather performance heavy and also does not solve the "bad wolf" (slop 1) 
contained by "big wolf" (slop 2) problem.
 
Any thoughts?

*A short theoretical aside: I think that most of the span query problems came 
from the fact that we want to have a "next match" function, i.e. some sort of 
ordering of matches, together with the nature of span query Matches, which are 
essentially a pair of numbers (start and end of match). This means we have to 
specify an order on pairs of numbers (which is possible, of course; the 
solution with span queries was a lexical order, i.e. the start always 
increases, and if it stays the same, the end increases). But I think it is not 
really possible to implement completly lazy behaviour with this ordering: Think 
of some ordered "((a OR b) followed by (c OR d)) with enough slop" and the 
document "a b c d" which should find "a b c d" before "b c" (as the start 
increases), but has to cache the match for "c", which (in the sub-query "(c OR 
d)) occurs before the one for "b". So the combination of 

[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-05-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468537#comment-16468537
 ] 

Alan Woodward commented on LUCENE-8196:
---

I opened LUCENE-8300 do deal with unordered overlaps.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453817#comment-16453817
 ] 

ASF subversion and git services commented on LUCENE-8196:
-

Commit 1a18acd783745f8fa11042f854f96a3e5ed5aa72 in lucene-solr's branch 
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a18acd ]

LUCENE-8196: Fix unordered case with non-matching subinterval


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453816#comment-16453816
 ] 

ASF subversion and git services commented on LUCENE-8196:
-

Commit 345fdff47cc6f09d774afefd46b2a653d0da7fa3 in lucene-solr's branch 
refs/heads/branch_7x from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=345fdff ]

LUCENE-8196: Fix unordered case with non-matching subinterval


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-26 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453767#comment-16453767
 ] 

Alan Woodward commented on LUCENE-8196:
---

I think minwidth() would run into problems with documents that have two 
instances of 'b', because unordered will always find the minimal intervals, so 
it would always end up with intervals of width 0, which would then be rejected 
by the filter, and you'd end up with missing matches.

What we really need here I think is a new source, something like 
'unordered-non-overlapping', which checks that all of the internal intervals 
are separated.  With a better name, of course :) . And we should rename 
'unordered' to 'and' to make the semantics a bit clearer.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-24 Thread Matt Weber (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450408#comment-16450408
 ] 

Matt Weber commented on LUCENE-8196:


[~jim.ferenczi] [~romseygeek]

So given a single document with the value {{a b}}. The following queries would 
both match this document:

{code:java}
Intervals.unordered(Intervals.term("b"), Intervals.term("a")) 
{code}

{code:java}
Intervals.unordered(Intervals.term("b"), Intervals.term("b")) 
{code}


The first I think would have an interval width of {{1}} and the 2nd should have 
a width of {{0}}.  So if we have a {{minwidth}} operator we could use that to 
set the minimum width to {{1}} preventing the 2nd from matching?  If both of 
these queries result in an interval with the same width then that feels wrong 
to me.  


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-24 Thread Matt Weber (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450158#comment-16450158
 ] 

Matt Weber commented on LUCENE-8196:


I use these queries to build query parsers and I am specifically thinking of an 
unordered near and how I can prevent it from matching the same term.  I can't 
think of any situation where a user would think {{NEAR(a, a)}} would match 
documents with a single {{a}} and if we can't get that by default I would like 
a way to explicitly prevent it myself.  Spans have the same issue as well, see 
LUCENE-3120.  

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-24 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450039#comment-16450039
 ] 

Jim Ferenczi commented on LUCENE-8196:
--

I don't think an operator can prevent anything here, a query for 
*Intervals.ordered(Intervals.term("w3"), Intervals.term("w3"))* should always 
return all intervals of the term "w3" (it will not interleave successive 
intervals of "w3"). [~mattweber] why do you think that this "scenario" should 
be prevented ? When I do "foo AND foo" I don't expect it to match only document 
that have foo twice ?

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-24 Thread Matt Weber (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450014#comment-16450014
 ] 

Matt Weber commented on LUCENE-8196:


[~jim.ferenczi] [~romseygeek]  I think rename to {{and}} makes sense, however, 
I would still live a way to explicitly prevent the scenario I described . Maybe 
a {{minwith}} operator?  The width at the same position/interval should be 
{{0}} right? 

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-24 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449461#comment-16449461
 ] 

Alan Woodward commented on LUCENE-8196:
---

Good catch [~jim.ferenczi], I'll commit that change.  I like the idea of 
changing *unordered* to *and* as well - I think that makes sense [~mattweber]?

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-24 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449409#comment-16449409
 ] 

Jim Ferenczi commented on LUCENE-8196:
--

I don't think we should prevent anything ;). *unordered* is a conjunction 
operator so it should match if all terms match (which is the case in your 
example) so these results are expected IMO. Maybe we should rename *unordered* 
to *and* in order to avoid confusion ?
If you want to match the same term within a max width the ordered query works 
fine:
{code:java}
Query q = new IntervalQuery(field, Intervals.maxwidth(2, 
Intervals.ordered(Intervals.term("w3"), Intervals.term("w3";
{code}

[~romseygeek] while I was playing with *unordered* I realized that we don't 
protect against sources that match but don't have intervals.
For instance:
{code:java}
Query q = new IntervalQuery(query, Intervals.unordered(Intervals.term("w2"), 
Intervals.ordered(Intervals.term("w3"),Intervals.term("w3";
{code}
does not work because the *unordered* query doesn't check if the sub source has 
intervals when it adds it in the queue. 
I attached a patch that fixes this issue and added some tests that fail without 
the fix. Can you take a look ?


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448387#comment-16448387
 ] 

Alan Woodward commented on LUCENE-8196:
---

bq. How would we prevent matching at the same interval?

The original paper doesn't look like it addresses this.  I'll try and work out 
the best way of dealing with things, I guess we'll need to keep track of the 
positions of internal intervals in the priority queue, and when we advance make 
sure that they don't collide.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-21 Thread Matt Weber (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446939#comment-16446939
 ] 

Matt Weber commented on LUCENE-8196:


[~romseygeek]  This is great!  How would we prevent matching at the same 
interval?  In {{TestIntervalQuery}}, I would expect this to pass but it matches 
every doc with {{w3}}.


{code:java}
public void testUnorderedQueryNoSelfMatch() throws IOException {
  Query q = new IntervalQuery(field, Intervals.maxwidth(2, 
Intervals.unordered(Intervals.term("w3"), Intervals.term("w3";
  checkHits(q, new int[]{1});
}
{code}

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425211#comment-16425211
 ] 

ASF subversion and git services commented on LUCENE-8196:
-

Commit 7117b68db6835acfeda17f04ab2c20a8c1ec2c17 in lucene-solr's branch 
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7117b68 ]

LUCENE-8196: Check for a null input in LowpassIntervalsSource


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425210#comment-16425210
 ] 

ASF subversion and git services commented on LUCENE-8196:
-

Commit 7fcaac8550e340512c09a8d8f4bd4773096f63f3 in lucene-solr's branch 
refs/heads/branch_7x from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7fcaac8 ]

LUCENE-8196: Check for a null input in LowpassIntervalsSource


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424063#comment-16424063
 ] 

ASF subversion and git services commented on LUCENE-8196:
-

Commit 1d6502cecb94330cd5a793ea82bbfe910c844d7f in lucene-solr's branch 
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1d6502c ]

LUCENE-8196: Check that the TermIntervalsSource is positioned on the correct 
term


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424062#comment-16424062
 ] 

ASF subversion and git services commented on LUCENE-8196:
-

Commit b772b585a095b32593e1b99ea7ad110921f3c721 in lucene-solr's branch 
refs/heads/branch_7x from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b772b58 ]

LUCENE-8196: Check that the TermIntervalsSource is positioned on the correct 
term


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423660#comment-16423660
 ] 

ASF subversion and git services commented on LUCENE-8196:
-

Commit 00eab54f9d6232c68a93f10ff20e3a724ffeca14 in lucene-solr's branch 
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=00eab54 ]

LUCENE-8196: Add IntervalQuery and IntervalsSource to the sandbox


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423659#comment-16423659
 ] 

ASF subversion and git services commented on LUCENE-8196:
-

Commit 974c03a6ca8eed3941e1414dd2ecb75132228d4f in lucene-solr's branch 
refs/heads/branch_7x from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=974c03a ]

LUCENE-8196: Add IntervalQuery and IntervalsSource to the sandbox


> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-29 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418695#comment-16418695
 ] 

Jim Ferenczi commented on LUCENE-8196:
--

+1

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-28 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417092#comment-16417092
 ] 

Alan Woodward commented on LUCENE-8196:
---

I talked over disjunction ordering with [~jim.ferenczi], and we agreed to 
revert back to the ordering specified in the original paper.  In future it 
might be worth contacting the authors and seeing if they've covered the case of 
prefix disjunctions elsewhere.

I think this is ready?

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-19 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404574#comment-16404574
 ] 

Alan Woodward commented on LUCENE-8196:
---

I moved the approximation and reset() from IntervalIterator, and it turned out 
that ConjunctionIntervalIterator was a convenient place to put them, so I kept 
that in.  I also cleaned up the utility classes (no need to check for 
TwoPhaseIterator or BitSetIterator when we know that the leaves are always 
PostingsEnum) and moved to org.apache.lucene.search.intervals.

Payload matches can be added trivially at a later point.  Payload scoring will 
be a more interesting one, but I'm not entirely happy with the way scoring 
works at the moment anyway, I need to read around a bit more on good ways of 
scoring proximity queries in general.  Offsets may not be trivial to add, as we 
have the same problem we originally had with Spans here, in that some of the 
algorithms advance leaf intervals before returning.  Something to consider 
later on, definitely.

The PQ specialization is just lifted directly from DisjunctionDISIApproximation 
in the existing core search package, so it wasn't my conclusion :)

re extractTerms() - I don't like this method being here in the first place 
really, see my comment about scoring above.  I think let's keep it simple for 
now, you can always filter afterwards.

[~jim.ferenczi] - where can I remove null checks?  I think I'm constrained by 
having to return [-1..-1] when iterator has moved to a new doc but hasn't been 
advanced over intervals yet.

The disjunction order is to address LUCENE-7398.  If a disjunction is appearing 
within a block, then sorting for minimum intervals can miss valid matches.  The 
Vigna paper doesn't seem to discuss this case anywhere.  One possible solution 
that would work in all cases would be to add a boolean to 
IntervalsSource.getIntervals() that indicates whether or not the source is in a 
final position - if it is, then it should return minimal intervals, otherwise 
it should return wider ones.  This would only be applicable to disjunctions.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-19 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404532#comment-16404532
 ] 

Jim Ferenczi commented on LUCENE-8196:
--

+1 too, there are some places where you could initialize the current interval 
with[−∞ . . −∞] in order to avoid the nullity check.

Most of the operators algorithm seem good, though I don't understand why you 
change the order of the disjunction ? If you don't start with the smallest 
right interval from the queue you could miss a lot of minimum intervals that 
could be needed if the disjunction is used inside another operator ?

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402538#comment-16402538
 ] 

David Smiley commented on LUCENE-8196:
--

* Nice package javadocs!
 * Maybe you will some day add a means of extracting offsets (e.g. for 
highlighting) or payloads?
 * Just curious, how did you arrive at the conclusion that you needed to 
specialize the PriorityQueue?
 * what if extractTerms took a Consumer instead of a Set? It's easy 
to invoke with a myset::add for the common case when you have a Set, and I've 
seen cases where you might want to provide a filter before storing it wherever.

 

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-16 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402069#comment-16402069
 ] 

Adrien Grand commented on LUCENE-8196:
--

Some comments:
 - ConjunctionIntervalIterator does so little, I suspect things would be easier 
to read if we removed it.
 - I'd like it better if we kept the API definition to a minimum on 
IntervalIterator and removed the constructor that takes an approximation and 
the reset() method. To me these should be implementation details?
 - Let's make the utility classes that you copied pkg-private?
 - Maybe let's put this in a sub package of search, ie. 
org.apache.lucene.search.intervals?

Otherwise +1

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-15 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400317#comment-16400317
 ] 

Alan Woodward commented on LUCENE-8196:
---

This patch moves everything into the sandbox for now, and adds a package-info 
explaining how to use things.  I had to duplicate DisiWrapper, 
DisiPriorityQueue and DisjunctionDISIApproximation, but I don't think that's 
too much of a problem.  Having things in sandbox should reduce confusion with 
Span queries, and give us time to try and switch things over.

I think this is ready to be committed.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-12 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394980#comment-16394980
 ] 

Alan Woodward commented on LUCENE-8196:
---

This patch makes IntervalIterator extend DocIdSetIterator, and makes the 
per-document reset() function protected and called automatically on nextDoc() 
and advance(target).

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-09 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393188#comment-16393188
 ] 

Jim Ferenczi commented on LUCENE-8196:
--

{quote}
I'd rather keep the API as it is, with the field being passed to IntervalQuery 
and then recursing down the IntervalSource tree.  Otherwise you end up having 
to declare the field on all the created sources, which seems redundant.  I've 
removed the cross-field hack entirely for the moment.
{quote}

+1 to remove the cross-field hack, thanks. Regarding the API it's ok since 
IntervalQuery limits all sources to one field so I am fine with that (I 
misunderstood how the IntervalQuery can be used).

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393182#comment-16393182
 ] 

Alan Woodward commented on LUCENE-8196:
---

I discussed scoring with [~jim.ferenczi] and [~jpountz] offline, and we decided 
to just use the inverse length of intervals as a sloppy frequency for now, as 
described in the Vigna paper linked above.  This means that we can't compare 
scores directly with existing phrase queries, but the query mechanism is quite 
different (particularly for SloppyPhraseScorer) so it makes sense that scores 
won't be the same either.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392834#comment-16392834
 ] 

Alan Woodward commented on LUCENE-8196:
---

I opened a pull request at [https://github.com/apache/lucene-solr/pull/334] to 
make this easier to review.  [~jpountz] I think I've addressed most of your 
feedback?

[~jim.ferenczi] I'd rather keep the API as it is, with the field being passed 
to IntervalQuery and then recursing down the IntervalSource tree.  Otherwise 
you end up having to declare the field on all the created sources, which seems 
redundant.  I've removed the cross-field hack entirely for the moment.

I'll see if I can improve the scoring next.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-08 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391723#comment-16391723
 ] 

Jim Ferenczi commented on LUCENE-8196:
--

{quote}
I was a bit annoyed to see the field masking hack but actually those intervals 
source do not need term statistics which makes the hack less horrible. Could 
you still document it to make sure users are aware it is a hack and explain it 
which circumstances it might be ok?
{quote}
 
 I think that the proposed API should be more restrictive regarding the 
targeted field. Could we restrict the IntervalsSource to work on a single field 
? Something like:

{code:java}
public abstract class IntervalsSource {
 protected final String field;

 public IntervalsSource(String field) {
   this.field = field;
 }

 public abstract IntervalIterator intervals(LeafReaderContext ctx) throws 
IOException;
...
{code}
... and then we can check in each implementation that the sources are all 
targeting the same field.
I understand that it might be powerful to mix multiple fields in an interval 
query but with the current API that seems to be the norm rather than an 
exception. We can add the field masking hack afterward but for the first 
iteration I think it's better to focus on the main use case for this new query 
which is to provide a way to find the minimum intervals in a single field. 

Regarding the score of the intervals, it seems that the patch uses the inverse 
length of the interval rather than the slop within the interval like the sloppy 
phrase scorer. Could we compute the total slop of the current interval (as the 
sum of the slop of each interval source that composed this interval) and use 
its inverse to score each ? This would make different interval query more 
comparable in terms of score since an interval with few terms and a slop>0 
would score less that one with more terms but no slop.

I'll look deeper at the implementation of the different queries but I like the 
simplicity of the patch and the fact that there is a paper with a proof for 
each of them.



> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch
>
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-08 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391580#comment-16391580
 ] 

Alan Woodward commented on LUCENE-8196:
---

{quote}Do we need {{IntervalIterator.score()}}?{quote}

Yes, terms and phrases return 1 instead than taking the overall width into 
account.  This is so that they score the same as TermQuery and PhraseQuery
{quote}Do we need {{advanceTo}}?{quote}
Unfortunately yes, or at least we need a way of resetting the iterator on each 
new document.  It might be possible to avoid passing the doc down and having a 
return value though, I'll see what I can do.
{quote}I would like some form of AssertingIntervalsSource{quote}
This is a bit trickier, as it's not obvious where the wrapping would happen.

+1 to everything else, I'll work on a follow-up.
 

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch
>
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-03-08 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391478#comment-16391478
 ] 

Adrien Grand commented on LUCENE-8196:
--

Thanks Alan. I agree that growing a separate hierarchy of objects might help 
land this feature. We might even want to put first iterations of this work in 
sandbox to give time for the API to stabilize before we move it to core or misc.

I have some questions/comments:
 - Do we need {{IntervalIterator.score()}}? It seems to be the same value on 
all implementations.
 - Do we need {{advanceTo}}? It seems to me that things would be simpler and as 
efficient if you documented that nextPosition() may only be called when the 
approximation is positioned and then {{advanceTo}} would be equivalent to 
checking the return value of {{nextInterval}}?
 - Let's make the {{IntervalFunction}} API an implementation detail?
 - The documentation of {{cost()}} says it is the cost of finding the next 
interval but given how you use it in the query it looks like it is actually 
more about the average cost of iterating over _all_ intervals.
 - In terms of testing I would like some form of AssertingIntervalsSource to 
make sure that intervals are always consumed in legal ways and behave correctly.
 - More docs would help read the code. For instance IntervalsSource.intervals 
has no docs. By the way we might want to mention there that the same instance 
might be reused across calls.
 - TermIntervalsSource should check whether positions were indexed.
 - I was a bit annoyed to see the field masking hack but actually those 
intervals source do not need term statistics which makes the hack less 
horrible. Could you still document it to make sure users are aware it is a hack 
and explain it which circumstances it might be ok?

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8196.patch
>
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org