Is there a way in the Schema to specify that the comma should be used to split the values up? e.g. Can I specify my "vector" field as multivalue and also specify some sort of tokeniser to automatically split on commas?

Ben


Uwe Klosa wrote:
You should split the strings at the comma yourself and store the values in a
multivalued field? Then wildcard search like A1_* are not a problem. I don't
know so much about facets. But if they work on multivalued fields that
should be then no problem at all.

Uwe

2009/7/1 Ben <b...@autonomic.net>

Yes, I had done that... however, I'm beginning to see now that what I am
doing is called a "wildcard query" which is going via Lucene's queryparser.
Lucene's query parser doesn't not support the regexp idea of character
exclusion ... i.e. I'm not trying to match "[" I'm trying to express "Match
as many characters as possible, which are not underscores" with [^_]*

Perhaps I'm going about my whole problem in an ineffective way, but I'm not
sure how I can sensibly describe what I'm doing without it becoming a long
document.

The only other approach I can think of is to change what I'm indexing but
I'm not sure how to achieve that.
I've tried explaining it once, and obviously failed, so I'll try again.

I'm given a string containing many vectors (where each dimension is
separated by an underscore, and each vector is seperated by a comma) e.g.

A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3

I want my facet query to tell me if, within one of the vectors within that
string, there is a match for dimensions I'm interested in. Of the four
dimensions in this example, I may choose to fix an arbitrary number of them
with values, and the rest with wildcards e.g. I might look for a facet
containing Ox_*_*_* so one of the vectors in the string must have its first
dimension matching "Ox" and I don't care about the rest.

***Is there a way to break down this string on the comma's so that I can
apply a normal wildcard query and SOLR applies it to each individually?***
That would solve all my problems :
e.g.
The string is internally represented in lucene/solr as
A1_B1_C1_D1
A2_B2_C2_D2
A3_B3_C3_D3

where it tries to match the wildcard query on each in turn?

Thanks for you help, I'm deeply confused about this at the moment...

Ben



Reply via email to