Re: Prevention of heavy wildcard queries

Isaac Hebsh Mon, 27 May 2013 14:02:28 -0700

Thanks Roman.
Based on some of your suggestions, will the steps below do the work?

* Create (and register) a new SearchComponent
* In its prepare method: Do for Q and all of the FQs (so this
SearchComponent should run AFTER QueryComponent, in order to see all of the
FQs)
* Create org.apache.lucene.queryparser.flexible.core.StandardQueryParser,
with a special implementation of QueryNodeProcessorPipeline, which contains
my NodeProcessor in the top of its list.
* Set my analyzer into that StandardQueryParser
* My NodeProcessor will be called for each term in the query, so it can
throw an exception if a (basic) querynode contains wildcard in both start
and end of the term.

Do I have a way to avoid from reimplementing the whole StandardQueryParser
class?
Will this work for both LuceneQParser and EdismaxQParser queries?

Any other solution/work-around? How do other production environments of
Solr overcome this issue?

On Mon, May 27, 2013 at 10:15 PM, Roman Chyla <roman.ch...@gmail.com> wrote:

> You are right that starting to parse the query before the query component
> can get soon very ugly and complicated. You should take advantage of the
> flex parser, it is already in lucene contrib - but if you are interested in
> the better version, look at
> https://issues.apache.org/jira/browse/LUCENE-5014
>
> The way you can solve this is:
>
> 1. use the standard syntax grammar (which allows *foo*)
> 2. add (or modify) WildcardQueryNodeProcessor to dis/allow that case, or
> raise error etc
>
> this way, you are changing semantics - but don't need to touch the syntax
> definition; of course, you may also change the grammar and allow only one
> instance of wildcard (or some combination) but for that you should probably
> use LUCENE-5014
>
> roman
>
> On Mon, May 27, 2013 at 2:18 PM, Isaac Hebsh <isaac.he...@gmail.com>
> wrote:
>
> > Hi.
> >
> > Searching terms with wildcard in their start, is solved with
> > ReversedWildcardFilterFactory. But, what about terms with wildcard in
> both
> > start AND end?
> >
> > This query is heavy, and I want to disallow such queries from my users.
> >
> > I'm looking for a way to cause these queries to fail.
> > I guess there is no built-in support for my need, so it is OK to write a
> > new solution.
> >
> > My current plan is to create a search component (which will run before
> > QueryComponent). It should analyze the query string, and to drop the
> query
> > if "too heavy" wildcard are found.
> >
> > Another option is to create a query parser, which wraps the current
> > (specified or default) qparser, and does the same work as above.
> >
> > These two options require an analysis of the query text, which might be
> an
> > ugly work (just think about nested queries [using _query_], OR even a lot
> > of more basic scenarios like quoted terms, etc.)
> >
> > Am I missing a simple and clean way to do this?
> > What would you do?
> >
> > P.S. if no simple solution exists, timeAllowed limit is the best
> > work-around I could think about. Any other suggestions?
> >
>

Re: Prevention of heavy wildcard queries

Reply via email to