I thought shingles were either a viral infection or roof material? (Hey, it's crazy friday early for me) Dennis Gearon
Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/2/10, Jonathan Rochkind <rochk...@jhu.edu> wrote: > From: Jonathan Rochkind <rochk...@jhu.edu> > Subject: Re: shingles work in analyzer but not real data > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Cc: "Vishal Patel" <vishal_pa...@silvertouch.com>, "Michiel Willekens" > <michiel.willek...@globalorange.nl> > Date: Thursday, September 2, 2010, 2:47 PM > I've run into this before too. Both > the dismax and solr-lucene _query parsers_ will tokenize a > query on whitespace _before_ they pass the query to any > field analyzers. > There are some reasons for this, lots of things wouldn't > work if they didn't do this. > > But it makes your approach kind of hard. Try doing your > search as a phrase search with double quotes, "apple pie", I > bet it'll work then -- because both dismax and solr-lucene > will respect the phrase quotes and NOT tokenize the stuff > inside there before it gets to the field analyzers. > > So if non-tokenized fields like this are all that are > included in your search, and if you can get your client > application to just force phrase quoting of everything > before sending to Solr, that might work. Otherwise.... I > don't know of a good solution. If you figure one out, let me > know. > > Jonathan > > Jeff Rose wrote: > > Hi, > > We are using SOLR to match query > strings with a keyword database, where > > some of the keywords are actually more than one > word. For example a keyword > > might be "apple pie" and we only want it to match for > a query containing > > that word pair, but not one only containing > "apple". Here is the relevant > > piece of the schema.xml, defining the index and query > pipelines: > > > > <fieldType name="text" > class="solr.TextField" positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer > class="solr.PatternTokenizerFactory" pattern=";"/> > > <filter > class="solr.LowerCaseFilterFactory"/> > > <filter > class="solr.TrimFilterFactory" /> > > </analyzer> > > <analyzer type="query"> > > <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter > class="solr.TrimFilterFactory" /> > > <filter class="solr.ShingleFilterFactory" /> > > </analyzer> > > </fieldType> > > > > In the analysis tool this schema looks like it works > correctly. Our > > multi-word keywords are indexed as a single entry, and > then when a search > > phrase contains one of these multi-word keywords it is > shingled and matched. > > Unfortunately, when we do the same queries on > top of the actual index it > > responds with zero matches. I can see in the > index histogram that the terms > > are correctly indexed from our mysql datasource > containing the keywords, but > > somehow the shingling doesn't appear to work on this > live data. Does anyone > > have experience with shingling that might have some > tips for us, or > > otherwise advice for debugging the issue? > > > > Thanks, > > Jeff > > > >