On 6/27/2015 4:27 AM, octopus wrote: > Hi, I'm looking at Solr's features for wildcard search used for a large > amount of text. I read on the net that solr.EdgeNGramFilterFactory is used > to generate tokens for wildcard searching. > > For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", > "nigeria", "nigerian" > > However, I have a large amount of text out there which requires wildcard > search and it's not viable to use EdgeNGrameFilterFactory as the amount of > processing will be too huge. Do you have any suggestions/advice please?
Both edgengrams and wildcards are ways to do this. There are advantages and disadvantages to both ways. To do a wildcard search, Solr (Lucene really) must look up all the matching terms in the index and substitute them into the query so that it becomes a large number of simple string matches. If you have a large number of terms in your index, that can be slow. The expensive work (expanding the terms) is done for every single query. The edgengram filter does similar work, but it does it at *index* time, rather than query time. At query time, you are doing a simple string match with one term, although the index contains many more terms, because the very expensive work was done at index time. It's difficult to know which approach will be more efficient on *your* index without experimentation, but there is a general rule when it comes to Solr performance: As much as possible, do the expensive work at index time. Thanks, Shawn