Hi, I'm trying to use AutoPhrasingTokenFilterFactory which seems to be a great solution to our phrase query issues. But doesn't seem to work as mentioned in the blog :
https://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/ The tokenizer is working as expected during query time, where it's preserving the phrases as a single token based on the text file. Here's my field definition : <fieldType name="text_autophrase" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="com.lucidworks.analysis.AutoPhrasingTokenFilterFactory" phrases="autophrases.txt" includeTokens="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.KStemFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.KStemFilterFactory" /> </analyzer> </fieldType> On analyzing, I can see the phrase "seat cushions" (defined in autophrases.txt) is being indexed as "seat", "seat cushions" and "cushion". The problem is during the query time. As per the blog, the request handler needs to use a custom query parser to achieve the result. Here's my entry in solrconfig. <requestHandler name="/autophrase" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">browse</str> <str name="v.layout">layout</str> <str name="title">Solritas</str> <str name="echoParams">explicit</str> <int name="rows">10</int> <str name="df">text</str> <str name="defType">autophrasingParser</str> </lst> </requestHandler> <queryParser name="autophrasingParser" class="com.lucidworks.analysis.AutoPhrasingQParserPlugin" > <str name="phrases">autophrases.txt</str> </queryParser> But if I query "seat cushions" using this request handler, it's seemed to be treating the query as two separate terms and returning all results matching "seat" and "cushion". Not sure what I'm missing here. I'm using Solr 4.10. The other question I had is whether "com.lucidworks.analysis.AutoPhrasingQParserPlugin" supports the edismax features which is my default parser. I'll appreciate if anyone provide their feedback. -Thanks Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808.html Sent from the Solr - User mailing list archive at Nabble.com.