Re: Reading Performance

2006-12-08 Thread Andrew Hudson
I think I've seen this problem when you use Lucene's built in delete mechanism, IndexReader.deleteDocument I believe. The problem was it was synchronizing on a java BitSet, which totally killed performance when more than one process was using the same IndexReader. Better way to do deletes is to

Re: short documents = help me tweak Similarity??

2007-04-05 Thread Andrew Hudson
The problem comes when your float value is encoded into that 8 bit field norm, the 3 length and 4 length both become the same 8 bit value. Call Similarity.encodeNorm on the values you calculate for the different numbers of terms and make sure they return different byte values. Andrew On 4/5/07,

Re: short documents = help me tweak Similarity??

2007-04-05 Thread Andrew Hudson
> Also, i don't understand why the encode/decode functions have a range of 7x10^9 to 2x10^-9, when it seems to me the most common values are (boosts set to 1.0) something between 1.0 and 0. When would somebody have a monster huge value like 7x10^9? Even with a huge index time boost of 20.0 or s

Re: Lucene score algorithm details?

2005-08-08 Thread Andrew Hudson
Is the docboost being used in scoring currently? I haven't been able to see a clear connection between it and the score that lucene calculates both empirically and in the scoring code itself. Andrew On 8/8/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > It's in the Javadoc for Similarity class

Inefficiency in MultiReader / MultiTermDocs.skipTo (non optimized indexes)

2006-05-31 Thread Andrew Hudson
In our application we noticed that anytime there was more than one segment (as in not optimized) in the index that there was a big drop in performance. After thinking about this for a long time it didn't add up, even if you optimize an index and then add just 1 job the big drop occurs. I tracked