Re: Excluding characters from a wildcard query

Uwe Klosa Wed, 01 Jul 2009 04:16:31 -0700

To get the desired efffect I described you have to do the split before you
send the document to solr. I'm not aware of an analyzer that can split one
field value into several field values. The analyzers and tokenizers do
create tokens from field values in many different ways.


As I see it you have to do some preprocessing yourself.

Uwe

2009/7/1 Ben <b...@autonomic.net>

> Is there a way in the Schema to specify that the comma should be used to
> split the values up? e.g. Can I specify my "vector" field as multivalue and
> also specify some sort of tokeniser to automatically split on commas?
>
> Ben
>
>
>
> Uwe Klosa wrote:
>
>> You should split the strings at the comma yourself and store the values in
>> a
>> multivalued field? Then wildcard search like A1_* are not a problem. I
>> don't
>> know so much about facets. But if they work on multivalued fields that
>> should be then no problem at all.
>>
>> Uwe
>>
>> 2009/7/1 Ben <b...@autonomic.net>
>>
>>
>>
>>> Yes, I had done that... however, I'm beginning to see now that what I am
>>> doing is called a "wildcard query" which is going via Lucene's
>>> queryparser.
>>> Lucene's query parser doesn't not support the regexp idea of character
>>> exclusion ... i.e. I'm not trying to match "[" I'm trying to express
>>> "Match
>>> as many characters as possible, which are not underscores" with [^_]*
>>>
>>> Perhaps I'm going about my whole problem in an ineffective way, but I'm
>>> not
>>> sure how I can sensibly describe what I'm doing without it becoming a
>>> long
>>> document.
>>>
>>> The only other approach I can think of is to change what I'm indexing but
>>> I'm not sure how to achieve that.
>>> I've tried explaining it once, and obviously failed, so I'll try again.
>>>
>>> I'm given a string containing many vectors (where each dimension is
>>> separated by an underscore, and each vector is seperated by a comma) e.g.
>>>
>>> A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3
>>>
>>> I want my facet query to tell me if, within one of the vectors within
>>> that
>>> string, there is a match for dimensions I'm interested in. Of the four
>>> dimensions in this example, I may choose to fix an arbitrary number of
>>> them
>>> with values, and the rest with wildcards e.g. I might look for a facet
>>> containing Ox_*_*_* so one of the vectors in the string must have its
>>> first
>>> dimension matching "Ox" and I don't care about the rest.
>>>
>>> ***Is there a way to break down this string on the comma's so that I can
>>> apply a normal wildcard query and SOLR applies it to each
>>> individually?***
>>> That would solve all my problems :
>>> e.g.
>>> The string is internally represented in lucene/solr as
>>> A1_B1_C1_D1
>>> A2_B2_C2_D2
>>> A3_B3_C3_D3
>>>
>>> where it tries to match the wildcard query on each in turn?
>>>
>>> Thanks for you help, I'm deeply confused about this at the moment...
>>>
>>> Ben
>>>
>>>
>>>
>>
>>
>>
>
>

Re: Excluding characters from a wildcard query

Reply via email to