[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom Burton-West updated SOLR-908: --------------------------------- Attachment: SOLR-908.patch Cleaned up patch. ASF license now included in all files. Umich copyright statements removed Removed author and revision tags Cleaned up code and reformatted using Solr Eclipse codestyle > Port of Nutch CommonGrams filter to Solr > ----------------------------------------- > > Key: SOLR-908 > URL: https://issues.apache.org/jira/browse/SOLR-908 > Project: Solr > Issue Type: Wish > Components: Analysis > Reporter: Tom Burton-West > Priority: Minor > Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch > > > Phrase queries containing common words are extremely slow. We are reluctant > to just use stop words due to various problems with false hits and some > things becoming impossible to search with stop words turned on. (For example > "to be or not to be", "the who", "man in the moon" vs "man on the moon" etc.) > > Several postings regarding slow phrase queries have suggested using the > approach used by Nutch. Perhaps someone with more Java/Solr experience might > take this on. > It should be possible to port the Nutch CommonGrams code to Solr and create > a suitable Solr FilterFactory so that it could be used in Solr by listing it > in the Solr schema.xml. > "Construct n-grams for frequently occuring terms and phrases while indexing. > Optimize phrase queries to use the n-grams. Single terms are still indexed > too, with n-grams overlaid." > http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.