I have a few things I'd like to check with the Luke handler, if you call could check some of the assumptions, that would be great.

* I want to print out the document frequency for a term in a given document. Since that term shows up in the given document, I would think the term frequency must be > 1. I am using: reader.docFreq( t ) [line 236] The results seem reasonable, but *sometimes* it returns zero... is that possible?

* I want to return the lucene field flags for each field. I run through all the field names with: reader.getFieldNames(IndexReader.FieldOption.ALL). Is there a way to get any Fieldable for a given name? IIUC, all terms with the same name will have the same flags. I tried searching for a document with that field, it works, but only for stored fields.

* I just realized that I am only returning stored fields for get getDocumentFieldsInfo() (it uses Document.getFields()) How can I get find *all* Fieldables for a given document? I have tried following the luke source, but get a bit lost ;)

* Each field gets an boolean attribute "cacheableFaceting" -- this true if the number of distinct terms is smaller then the filterCacheSize. I get the filterCacheSize from: solrconfig.xml:"query/filterCache/@size" and get the distinctTerm count from counting up the termEnum. Is this logic solid? I know the cacheability changes if you are faciting multiple fields at once, but its still nice to have a ballpark estimate without needing to know the internals.


thanks for any pointers
ryan

Reply via email to