One other idea I tried, which didn't work, was to see if I could get proper parsing via the stream arg:
http://localhost:8983/solr/mlt?stream.body=hello+world&mlt.fl=shingle_field&mlt.mintf=0&debugQuery=true On Tue, Aug 11, 2009 at 9:09 AM, Mark Bennett <mbenn...@ideaeng.com> wrote: > I've got an index building with the shingle filter and I can see the > compound terms with Luke, etc. So far so good. One detail, I did tell it > to not emit unigrams - I've got single words covered in a normal field. > > And a bit of poking around the other day explained why shingle queries > weren't working with the dismax handler in 1.4, also fine, I believe I > understand now. > > But switching to the standard query handler, I still don't get proper > multi-word shingle handling in any query, either via the web interface nor > the various Java calls. I'm guessing it has to do with the order tokens are > parsed in, but if so I'm not sure what the workaround is. > > Some things I've tried: > > Standard Solr query: > ...&q=shingle_field:hello+world&debugQuery=true > > Standard Solr query, with the detault field set to the shingle field: > ...&q=hello+world&debugQuery=true > > Standard Solr query, with the detault field set to the shingle field: > ...&q="hello+world"&debugQuery=true > > I switched over to Java. Regular queries worked pretty easily, I could > print them out. But attempts to conjure a shingle query always produce > nothing. > > // fieldName = shingle field > SolrQueryParser qp = new SolrQueryParser( schema, fieldName ); > Query q = qp.parse( "hello world" ); > System.out.println( "Query Object = " + q ); > > SolrQuery q = new SolrQuery(); > q.addField( fieldName ); // Just setting a return field I think.... > q.setQuery( "hello world" ); > System.out.println( "Query Object = " + q ); > > // And I figured this one wouldn't work: > SolrQueryParser qp = new SolrPluginUtils.DisjunctionMaxQueryParser( > schema, fieldName ); > SolrQuery q = qp.parse( "hello world" ); > Query q = qp.parse( "hello world" ); > System.out.println( "Query Object = " + q ); > > Looking at the constructors for > org.apache.lucene.analysis.shingle.ShingleFilter they all seem to want a > token stream, vs. a string. But I think the default query entry points into > Solr are what's getting me to the single token at a time problem. > > I did verify that it's finding my schema, and if I put a non-existent field > name in there, it certainly notices. I've tried with and without the > PositionFilterFactory filter. If I comment out the shingle stage everything > works. > > <fieldType name="text_shingle" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="false" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" > generateNumberParts="0" > catenateWords="1" > catenateNumbers="1" > catenateAll="0" > splitOnCaseChange="0" > stemEnglishPossessive="0" > preserveOriginal="0" > /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > outputUnigrams="false"/> > <filter class="solr.PositionFilterFactory" /> > </analyzer> > </fieldType> > > > -- > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 >