On 01.02.2010, at 13:27, Lukas Kahwe Smith wrote:

> 
> On 29.01.2010, at 15:40, Lukas Kahwe Smith wrote:
> 
>> I am still a bit unsure how to handle both the lowercased and the case 
>> preserved version:
>> 
>> So here are some examples:
>> UBS => ubs|UBS
>> Kreuzstrasse => kreuzstrasse|Kreuzstrasse
>> 
>> So when I type "Kreu" I would get a suggestion of "Kreuzstrasse" and with 
>> "kreu" I would get "kreuzstrasse".
>> Since I do not expect any words to start with a lowercase letter and still 
>> contain some upper case letter we should be fine with this approach.
>> 
>> As in I doubt there would be stuff like "fooBar" which would lead to 
>> suggestion both "foobar" and "fooBar".
>> 
>> How can I achieve this?
> 
> 
> I just noticed that I need the same thing for the word delimiter splitter. As 
> in some way to index both the splitted and the unsplitted version so that I 
> can use it in a facet search.
> 
> Hans-Peter => Hans|Peter|Hans-Peter


Sorry for the monolog.
I did see 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg29786.html, which 
suggests a solution just for lowercase indexing with mixed case suggest via 
concatenating the lowercased version with some separator with the original 
version.

I guess what I could just do is feed in the same data multiple times and do the 
approach of [indexterm]|[original] in user land somehow

like "Hans-Peter" would be turned into 3 documents:
hans|Hans-Peter
peter|Hans-Peter
hans-peter|Hans-Peter

This solution would be quite cool indeed, since I could suggest "Hans-Peter" if 
someone searches for "Peter".
Since I will just use this for a prefix search, I could just set the query 
analyzer to lowercase the search and it should find the results and I can then 
add some magic to the frontend display logic to split off the suggested 
original term.

I am not aware of any magic inside the schema.xml that could do this work for 
me though. I am using the DatabaseHandler to load the documents. I guess I 
could simply run the query multiple times, but that would screw up the indexing 
of the non auto suggest index. Then again maybe I want to totally separate the 
two anyways.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org



Reply via email to