Re: Range query on a substring.

Roman Chyla Tue, 16 Jul 2013 11:08:58 -0700

Well, I think this is slightly too categorical - a range query on a
substring can be thought of as a simple range query. So, for example the
following query:


"lucene 1*"

becomes behind the scenes: "lucene (10|11|12|13|14|1abcd)"

the issue there is that it is a string range, but it is a range query - it
just has to be indexed in a clever way

So, Marcin, you still have quite a few options besides the strict boolean
query model

1. have a special tokenizer chain which creates one token out of these
groups (eg. "some text prefix_1") and search for "some text prefix_*" [and
do some post-filtering if necessary]
2. another version, using regex /some text (1|2|3...)/ - you got the idea
3. construct the lucene multi-term range query automatically, in your
qparser - to produce a phrase query "lucene (10|11|12|13|14)"
4. use payloads to index your integer at the position of "some text" and
then retrieve only "some text" where the payload is in range x-y - an
example is here, look at getPayloadQuery()
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java-
but this is more complex situation and if you google, you will find a
better description
5. use a qparser that is able to handle nested search and analysis at the
same time - eg. your query is: field:"some text" NEAR1 field:[0 TO 10] - i
know about a parser that can handle this and i invite others to check it
out (yeah, JIRA tickets need reviewers ;-))
https://issues.apache.org/jira/browse/LUCENE-5014

there might be others i forgot, but it is certainly doable; but as Jack
points out, you may want to stop for a moment to reflect whether it is
necessary

HTH,

  roman


On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky <j...@basetechnology.com>wrote:

> Sorry, but you are basically misusing Solr (and multivalued fields),
> trying to take a "shortcut" to avoid a proper data model.
>
> To properly use Solr, you need to put each of these multivalued field
> values in a separate Solr document, with a "text" field and a "value"
> field. Then, you can query:
>
>    text:"some text" AND value:[min-value TO max-value]
>
> Exactly how you should restructure your data model is dependent on all of
> your other requirements.
>
> You may be able to simply flatten your data.
>
> You may be able to use a simple join operation.
>
> Or, maybe you need to do a multi-step query operation if you data is
> sufficiently complex.
>
> If you want to keep your multivalued field in its current form for display
> purposes or keyword search, or exact match search, fine, but your stated
> goal is inconsistent with the Semantics of Solr and Lucene.
>
> To be crystal clear, there is no such thing as "a range query on a
> substring" in Solr or Lucene.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Marcin Rzewucki
> Sent: Tuesday, July 16, 2013 5:13 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Range query on a substring.
>
>
> By multivalued I meant an array of values. For example:
> <arr name="myfield">
>  <str>text1 (X)</str>
>  <str>text2 (Y)</str>
> </arr>
>
> I'd like to avoid spliting it as you propose. I have 2.3mn collection with
> pretty large records (few hundreds fields and more per record). Duplicating
> them would impact performance.
>
> Regards.
>
>
>
> On 16 July 2013 10:26, Oleg Burlaca <oburl...@gmail.com> wrote:
>
>  Ah, you mean something like this:
>> record:
>> Id=10, text =  "this is a text N1 (X), another text N2 (Y), text N3 (Z)"
>> Id=11, text =  "this is a text N1 (W), another text N2 (Q), third text
>> (M)"
>>
>> and you need to search for: "text N1" and X < B ?
>> How big is the core? the first thing that comes to my mind, again, at
>> indexing level,
>> split the text into pieces and index it in solr like this:
>>
>> record_id | text      | value
>> 10           | text N1 | X
>> 10           | text N2 | Y
>> 10           | text N3 | Z
>>
>> does it help?
>>
>>
>>
>> On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki <mrzewu...@gmail.com
>> >wrote:
>>
>> > Hi Oleg,
>> > It's a multivalued field and it won't be easier to query when I split
>> this
>> > field into text and numbers. I may get wrong results.
>> >
>> > Regards.
>> >
>> >
>> > On 16 July 2013 09:35, Oleg Burlaca <oburl...@gmail.com> wrote:
>> >
>> > > IMHO the number(s) should be extracted and stored in separate columns
>> in
>> > > SOLR at indexing time.
>> > >
>> > > --
>> > > Oleg
>> > >
>> > >
>> > > On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki <
>> mrzewu...@gmail.com
>> > > >wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I have a problem (wonder if it is possible to solve it at all) with
>> the
>> > > > following query. There are documents with a field which contains a
>> text
>> > > and
>> > > > a number in brackets, eg.
>> > > >
>> > > > myfield: this is a text (number)
>> > > >
>> > > > There might be some other documents with the same text but different
>> > > number
>> > > > in brackets.
>> > > > I'd like to find documents with the given text say "this is a text"
>> and
>> > > > "number" between A and B. Is it possible in Solr ? Any ideas ?
>> > > >
>> > > > Kind regards.
>> > > >
>> > >
>> >
>>
>>
>

Re: Range query on a substring.

Reply via email to