On 6/27/2015 4:27 AM, octopus wrote:
> Hi, I'm looking at Solr's features for wildcard search used for a large
> amount of text. I read on the net that solr.EdgeNGramFilterFactory is used
> to generate tokens for wildcard searching. 
> 
> For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria",
> "nigeria", "nigerian"
> 
> However, I have a large amount of text out there which requires wildcard
> search and it's not viable to use EdgeNGrameFilterFactory as the amount of
> processing will be too huge. Do you have any suggestions/advice please?

Both edgengrams and wildcards are ways to do this.  There are advantages
and disadvantages to both ways.

To do a wildcard search, Solr (Lucene really) must look up all the
matching terms in the index and substitute them into the query so that
it becomes a large number of simple string matches.  If you have a large
number of terms in your index, that can be slow.  The expensive work
(expanding the terms) is done for every single query.

The edgengram filter does similar work, but it does it at *index* time,
rather than query time.  At query time, you are doing a simple string
match with one term, although the index contains many more terms,
because the very expensive work was done at index time.

It's difficult to know which approach will be more efficient on *your*
index without experimentation, but there is a general rule when it comes
to Solr performance: As much as possible, do the expensive work at index
time.

Thanks,
Shawn

Reply via email to