On Mon, 24 Nov 2008 13:31:39 -0500 "Burton-West, Tom" <[EMAIL PROTECTED]> wrote:
> The approach to this problem used by Nutch looks promising. Has anyone > ported the Nutch CommonGrams filter to Solr? > > "Construct n-grams for frequently occuring terms and phrases while > indexing. Optimize phrase queries to use the n-grams. Single terms are > still indexed too, with n-grams overlaid." > http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/C > ommonGrams.html Tom, i haven't used Nutch's implementation, but used the current implementation (1.3) of ngrams and shingles to address exactly the same issue ( database of music albums and tracks). We didn't notice any severe performance hit but : - data set isn't huge ( ca 1 MM docs). - reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to lower the number of hits to SOLR. B _________________________ {Beto|Norberto|Numard} Meijome "Truth has no special time of its own. Its hour is now -- always." Albert Schweitzer I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.