: We are using Solr as a user index, and users have email addresses.
: 
: Our old search behavior used a SQL substring match for any search
: terms entered, and so users are used to being able to search for e.g.
: "chr" and finding my email address ("ch...@christopherschultz.net").
: 
: By default, Solr doesn't perform substring matches, and it might be
: difficult to re-train users to use *chr* to find email addresses by
: substring.

In the past, were you really doing arbitrary substring matching, or just 
prefix matching?  ie would a search for "sto" match 
"ch...@christopherschultz.net"

Personally, if you know you have an email field, would suggest using a 
custom tokenizer that splits on "@" and "." (and maybe other punctuation 
characters like "-") and then take your raw user input and feed it to the 
prefix parser (instead of requiring your users to add the "*")...

 q={!prefix f=email v=$user_input}&user_input=chr

...which would match ch...@gmail.com, f...@chris.com, f...@bar.chr etc. 

(this wouldn't help you though if you *really* want arbitrary substring 
matching -- as erick suggested ngrams is pretty much your best bet for 
something like that)

Bear in mind, you can combine that "forced prefix" query against 
the (otkenized) email field with other queries that 
could parse your input in other ways...

user_input=...
q=({!prefix f=email v=$user_input} 
   OR {!dismax qf="first_name last_name" ..etc.. v=$user_input})

so if your user input is "chris" you'll get term matches on the 
first_name field, or the last_name field as well as prefix matches on the 
email field.



-Hoss
http://www.lucidworks.com/

Reply via email to