Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

Norberto Meijome Tue, 25 Nov 2008 15:09:09 -0800

On Mon, 24 Nov 2008 13:31:39 -0500
"Burton-West, Tom" <[EMAIL PROTECTED]> wrote:


> The approach to this problem used by Nutch looks promising.  Has anyone
> ported the Nutch CommonGrams filter to Solr?
> 
> "Construct n-grams for frequently occuring terms and phrases while
> indexing. Optimize phrase queries to use the n-grams. Single terms are
> still indexed too, with n-grams overlaid."
> http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/C
> ommonGrams.html

Tom,
i haven't used Nutch's implementation, but used the current implementation
(1.3) of ngrams and shingles to address exactly the same issue ( database of
music albums and tracks). 
We didn't notice any severe performance hit but :
- data set isn't huge ( ca 1 MM docs).
- reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to
lower the number of hits to SOLR.

B
_________________________
{Beto|Norberto|Numard} Meijome

"Truth has no special time of its own.  Its hour is now -- always."
   Albert Schweitzer

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

Reply via email to