I'm really reaching here, but lucene only indexes the first 10,000 terms by default (you can up the limit). Is there a chancethat you're hitting that limit? That 1cuk is past the 10,000th term in record 2.40?
For this to be possible, I have to assume that the FieldAnalysis tool ignores this limit.... FWIW Erick On Fri, Oct 23, 2009 at 12:01 PM, Andrew Clegg <andrew.cl...@gmail.com>wrote: > > Hi, > > I have a field in my index called related_ids, indexed and stored, with the > following field type: > > <!-- > A text field that tokenizes on whitespace, removing non-word > characters at the > start and end of each token, but preserving meaningful punctuation > *within* > tokens (e.g. B/BEIJING/1/87 MG-ATP-K-OXALATE ). Also converts to > lowercase. > --> > <fieldType name="keywords_ids" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.PatternTokenizerFactory" > pattern="\W*\s+\W*" /> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > Several records in my index contain the token 1cuk in the related_ids > field, > but only *some* of them are returned when I query on this. e.g. if I send a > query like this: > > > http://localhost:8080/solr/select/?q=id:2.40.50+AND+related_ids:1cuk&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids > > I get a single hit for the record with id:2.40.50 . But if I try this, on a > different record with id:2.40 : > > > http://localhost:8080/solr/select/?q=id:2.40+AND+related_ids:1cuk&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids > > I get no hits. However, if I just query for id:2.40 ... > > > http://localhost:8080/solr/select/?q=id:2.40&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids > > I can clearly see the token "1cuk" in the related_ids field. > > Not only that, but if I copy and paste record 2.40's related_ids field into > the Field Analysis tool in the admin interface, and search on "1cuk", the > term 1cuk is visible in the index analyzer's term list, and highlighted! So > Field Analysis thinks that I *should* be getting a hit for this term. > > Can anyone suggest how I'd go about diagnosing this? I'm kind of hitting a > brick wall here. > > If it makes any difference, related_ids for the culprit record 2.40 is > large-ish but not enormous (31000 terms). Also I've tried stopping and > restarting Solr in case it was some weird caching thing. > > Thanks in advance, > > Andrew. > > -- > View this message in context: > http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-tp26029040p26029040.html > Sent from the Solr - User mailing list archive at Nabble.com. > >