subject:"Avoiding false\-positives in multivalued field search with intervals\?"

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-17 Thread Dawid Weiss

Hi Chris, > Because if you can adjust your parser syntax, this literallyly just > becomes: ' field:"foo bar"~N ' ... where N is the positionIncrementGap > on your analyzer ... OR ... ' field:"foo bar" ' ... if you call > setPhraseSlop on your QueryParser. Yes - correct. This would be equivale

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-17 Thread Chris Hostetter

(caveat: i don't ever really understand what Intervals at hte lucene feature set stage) : Yup - similar to what Alan suggested. I'd have to rewrite the (general : text-to-query) query parser to only use intervals though. Still : thinking about possible approaches to this. ... : > You co

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-14 Thread Dawid Weiss

Thanks Michael. The outcome of this discussion seems to be clear that everyone is trying to reinvent the wheel somehow. ;) I think it really should become part of core Lucene functionality. Seems like a corner case people are not aware of until they hit it (and then it's not clear what to do about

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-14 Thread Michael Gibney

This might be a little outside the spirit of this discussion (in that it's not really "off-the-shelf") -- but I implemented a proof-of-concept for a different use case that I think could be adapted here: For a given doc, for each term in your multivalued field, you could record a bitset representa

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-14 Thread Dawid Weiss

bq. Expanding a query over numerous fields grows combinatorically in the number of fields (if I want my query to match when all terms match in *some* field), doesn't it? I don't think it does? It grows linearly with the number of fields? In my experience the number of fields searchable "by default

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-11 Thread Gus Heck

You're thinking of SurroundQuery parser for span queries I think... https://lucene.apache.org/solr/guide/8_6/other-parsers.html#surround-query-parser and the Advanced Query Parser will have a similar syntax On Thu, Sep 10, 2020 at 4:40 PM Michael Sokolov wrote: > A slightly different but related

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Michael Sokolov

A slightly different but related topic is how to manage lots of fields I agree that sub-fields are a pain and that mashing everything together in an all-field is a mess, but for best performance with a large number of fields/sub-fields, it is the only workable option I can see? Expanding a query o

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss

> Ok so the more general question is whether we need an interval query parser Oh, to this I'd say: yes, yes, yes. I didn't have much prior experience writing frontend apps on top of Solr/Lucene but once I did have to go that route it quickly turns out that several things that are readily availabl

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread jim ferenczi

Ok so the more general question is whether we need an interval query parser Le jeu. 10 sept. 2020 à 17:28, Dawid Weiss a écrit : > I am fine with the boundary token suggestion, actually. What I don't > see at the moment is how I can marry it with an output of a general > query parser (which retu

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss

I am fine with the boundary token suggestion, actually. What I don't see at the moment is how I can marry it with an output of a general query parser (which returns any Query). I could give an attempt to process the query node tree from standard query parser (which we're using at the moment anyway)

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread jim ferenczi

Right, I misunderstood Alan's answer. The boundary option is not "impure" in my opinion. It solves this issue nicely but maybe it needs something more packaged to add the boundaries and build queries easily. Le jeu. 10 sept. 2020 à 16:16, Dawid Weiss a écrit : > Yup - similar to what Alan sugges

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss

Yup - similar to what Alan suggested. I'd have to rewrite the (general text-to-query) query parser to only use intervals though. Still thinking about possible approaches to this. D. On Thu, Sep 10, 2020 at 3:58 PM jim ferenczi wrote: > > You could set a very high position increment gap for multi

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread jim ferenczi

You could set a very high position increment gap for multi-valued fields (Analyzer#getPositionIncrementGap) and perform something like Intervals.maxWidth(Intervals.unordered(...), pos_gap-1) ? Le jeu. 10 sept. 2020 à 12:32, Dawid Weiss a écrit : > Yeah... I was thinking about adding synthetic b

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss

Yeah... I was thinking about adding synthetic boundaries but this seems... impure. :) Another quick reflection is that I'd have to somehow translate the original query (which can be arbitrarily complex) into an interval query. Tough. D. On Thu, Sep 10, 2020 at 12:22 PM Alan Woodward wrote: > > I

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Alan Woodward

I’ve solved this sort of thing in the past by indexing boundary tokens, and wrapping the queries with the equivalent of Intervals.notContaining(query, boundary-query); you could also put a very large position increment gap and use a width filter, but that’s a bit more error prone if you could co

Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss

Hi Alan, You're the expert here so I thought I'd ask before I jump in deep. Do you think it's feasible to solve the following multivalued-field problem: doc: field=["foo", "bar"] query: field:(foo AND bar) I'd like the above to return zero hits (no single value contains both foo and bar), but si

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Re: Avoiding false-positives in multivalued field search with intervals?

Avoiding false-positives in multivalued field search with intervals?

16 matches

Site Navigation

Mail list logo

Footer information