I'm really reaching here, but lucene only indexes the first 10,000 terms by
default (you can up the limit). Is there a chancethat you're hitting that
limit? That 1cuk is past the 10,000th term
in record 2.40?

For this to be possible, I have to assume that the FieldAnalysis
tool ignores this limit....

FWIW
Erick

On Fri, Oct 23, 2009 at 12:01 PM, Andrew Clegg <andrew.cl...@gmail.com>wrote:

>
> Hi,
>
> I have a field in my index called related_ids, indexed and stored, with the
> following field type:
>
>        <!--
>        A text field that tokenizes on whitespace, removing non-word
> characters at the
>        start and end of each token, but preserving meaningful punctuation
> *within*
>        tokens (e.g. B/BEIJING/1/87 MG-ATP-K-OXALATE ). Also converts to
> lowercase.
>        -->
>        <fieldType name="keywords_ids" class="solr.TextField"
> positionIncrementGap="100">
>            <analyzer>
>                <tokenizer class="solr.PatternTokenizerFactory"
> pattern="\W*\s+\W*" />
>                <filter class="solr.LowerCaseFilterFactory"/>
>            </analyzer>
>        </fieldType>
>
> Several records in my index contain the token 1cuk in the related_ids
> field,
> but only *some* of them are returned when I query on this. e.g. if I send a
> query like this:
>
>
> http://localhost:8080/solr/select/?q=id:2.40.50+AND+related_ids:1cuk&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids
>
> I get a single hit for the record with id:2.40.50 . But if I try this, on a
> different record with id:2.40 :
>
>
> http://localhost:8080/solr/select/?q=id:2.40+AND+related_ids:1cuk&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids
>
> I get no hits. However, if I just query for id:2.40 ...
>
>
> http://localhost:8080/solr/select/?q=id:2.40&version=2.2&start=0&rows=20&indent=on&fl=id,title,related_ids
>
> I can clearly see the token "1cuk" in the related_ids field.
>
> Not only that, but if I copy and paste record 2.40's related_ids field into
> the Field Analysis tool in the admin interface, and search on "1cuk", the
> term 1cuk is visible in the index analyzer's term list, and highlighted! So
> Field Analysis thinks that I *should* be getting a hit for this term.
>
> Can anyone suggest how I'd go about diagnosing this? I'm kind of hitting a
> brick wall here.
>
> If it makes any difference, related_ids for the culprit record 2.40 is
> large-ish but not enormous (31000 terms). Also I've tried stopping and
> restarting Solr in case it was some weird caching thing.
>
> Thanks in advance,
>
> Andrew.
>
> --
> View this message in context:
> http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-tp26029040p26029040.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Reply via email to