On Aug 9, 2009, at 10:29 AM, Yonik Seeley wrote:
I did some quick indexing performance tests right before and right after the last lucene jar update - the results are not good... about 30% slower. The test was an 80 MB text field, 100K documents, 6 short text fields per document, with the solrconfig/schema from trunk copied to both environments. I imagine this has to do with the new TokenStream stuff in Lucene, and how back compatibility is implemented (which I haven't followed, but which many now involve reflection). We've never cached tokenstreams with everything that involves, but it may be that we will be forced to do so to recover the performance loss.
Or bite the bullet and upgrade to the incrementToken() method. It likely isn't that bad, maybe a few hours of work.
Still, we should try to isolate down where exactly it is happening.
