[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757472#action_12757472 ]
Robert Muir commented on SOLR-908: ---------------------------------- jason, i took a glance. i think the reset() for CommonGramsQueryFilter should not set prev = null this is because the initial state is not null: in the ctor, prev = new Token() with the current logic, this is what reset() must do also. also, fyi CommonGramsFilter does not need a reset since the stringbuffer isn't used to keep state, the best way I think to ensure its correct i think, is to add tests that consume and reuse/reset() > Port of Nutch CommonGrams filter to Solr > ----------------------------------------- > > Key: SOLR-908 > URL: https://issues.apache.org/jira/browse/SOLR-908 > Project: Solr > Issue Type: Wish > Components: Analysis > Reporter: Tom Burton-West > Priority: Minor > Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, > SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, > SOLR-908.patch, SOLR-908.patch > > > Phrase queries containing common words are extremely slow. We are reluctant > to just use stop words due to various problems with false hits and some > things becoming impossible to search with stop words turned on. (For example > "to be or not to be", "the who", "man in the moon" vs "man on the moon" etc.) > > Several postings regarding slow phrase queries have suggested using the > approach used by Nutch. Perhaps someone with more Java/Solr experience might > take this on. > It should be possible to port the Nutch CommonGrams code to Solr and create > a suitable Solr FilterFactory so that it could be used in Solr by listing it > in the Solr schema.xml. > "Construct n-grams for frequently occuring terms and phrases while indexing. > Optimize phrase queries to use the n-grams. Single terms are still indexed > too, with n-grams overlaid." > http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.