RE: wildcard and proximity searches

Frederico Azeiteiro Wed, 04 Aug 2010 01:40:13 -0700

Thanks for you ideia.

At this point I'm logging each query time. My ideia is to divide my
queries into "normal queries" and "heavy queries". I have some heavy
queries with 1 minute or 2mintes to get results. But they have for
instance (*word1* AND *word2* AND word3*). I guess that this will be
always slower (could be a little faster with
"ReversedWildcardFilterFactory") but they never be ready in a few
seconds. For now, I just increased the timeout for those :) (using
solrnet).

My priority at the moment is the queries phrases like "word1* word2*
word3". After this is working, I'll try to optimize the "heavy queries"

Frederico

-----Original Message-----
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: quarta-feira, 4 de Agosto de 2010 01:41
To: solr-user@lucene.apache.org
Subject: Re: wildcard and proximity searches

Frederico Azeiteiro wrote:
>
>>> But it is unusual to use both leading and trailing * operator. Why
are
>>>       
> you doing this?
>
> Yes I know, but I have a few queries that need this. I'll try the
> "ReversedWildcardFilterFactory". 
>
>
>   

ReverseWildcardFilter will help leading wildcard, but will not help 
trying to use a query with BOTH leading and trailing wildcard. it'll 
still be slow. Solr/lucene isn't good at that; I didn't even know Solr 
would do it at all in fact.

If you really needed to do that, the way to play to solr/lucene's way of

doing things, would be to have a field where you actually index each 
_character_ as a seperate token. Then leading and trailing wildcard 
search is basically reduced to a "phrase search", but where the words 
are actually characters.   But then you're going to get an index where 
pretty much every token belongs to every document, which Solr isn't that

great at either, but then you can apply "commongram" stuff on top to 
help that out a lot too. Not quite sure what the end result will be, 
I've never tried it.  I'd only use that weird special "char as token" 
field for queries that actually required leading and trailing wildcards.

Figuring out how to set up your analyzers, and what (if anything) you're

going to have to do client-app-side to transform the user's query into 
something that'll end up searching like a "phrase search where each 
'word' is a character.... is left as an exersize for the reader. :)  

Jonathan

RE: wildcard and proximity searches

Reply via email to