On Mon, 24 Nov 2008 13:31:39 -0500
"Burton-West, Tom" <[EMAIL PROTECTED]> wrote:

> The approach to this problem used by Nutch looks promising.  Has anyone
> ported the Nutch CommonGrams filter to Solr?
> 
> "Construct n-grams for frequently occuring terms and phrases while
> indexing. Optimize phrase queries to use the n-grams. Single terms are
> still indexed too, with n-grams overlaid."
> http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/C
> ommonGrams.html

Tom,
i haven't used Nutch's implementation, but used the current implementation
(1.3) of ngrams and shingles to address exactly the same issue ( database of
music albums and tracks). 
We didn't notice any severe performance hit but :
- data set isn't huge ( ca 1 MM docs).
- reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to
lower the number of hits to SOLR.

B
_________________________
{Beto|Norberto|Numard} Meijome

"Truth has no special time of its own.  Its hour is now -- always."
   Albert Schweitzer

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Reply via email to