Re: indexing slowdown with latest lucene udpate

Grant Ingersoll Mon, 10 Aug 2009 07:08:09 -0700

FWIW, seems like these issues should be brought up on java-dev. Evenif the changes in Lucene are back compatible, that's not much help ifthe large majority of users are going to take a similar hit to whatSolr is taking.


On Aug 9, 2009, at 11:47 PM, Mark Miller wrote:

isMethodOverriden is just nasty - copying Methods, security checks,walking the type hierarchy, this, that, some more. I bet cglib has areally fast version - too bad there is no built in equivalent.
Its not nearly as clean, but what if a new TokenStream simplyidentified itself as supporting increment, and the default implreturns false? The developer knows at compile time right? Almost noreason to keep asking the code over and over again, especially sinceits so expensive. Then reusable doubles the cost.
Mark Miller wrote:
Michael Busch wrote:
Are you sure that the initialization costs of the TokenStream/AttributeSource cause the slowdown? With the bw-comp. code nowevery call of a Token method goes through a delegation layer. I'mafraid that might cause a slowdown?
Its isMethodOverriden and TokenStream<init>(AttributeSource).
The code that figures out what Attributes to put into the map usesreflection, but only if the impl wasn't seen before; otherwise theattributes are looked up in a cache.
The culprit could also be the reflection code that checks whichTokenStream methods are implemented.
I can't look at the code right now (writing on my cell).
Even if this is "fixable", I don't really like the fact that userswho upgrade to 2.9 will potentially see such a performance hitunless they implement incrementToken() and reusableTokenStream.
Looks like you take a good hit, but keep in mind that test isalmost worst case scenario as well - the Document text is extremelyshort.
Michael
On Aug 9, 2009, at 11:13 AM, Yonik Seeley <yo...@lucidimagination.com> wrote:
FYI
https://issues.apache.org/jira/browse/SOLR-1353
On Sun, Aug 9, 2009 at 2:02 PM, Yonik Seeley<yo...@lucidimagination.com> wrote:
It looks like implementing the new attribute stuff will not beenough- the token architecture has changed enough that it looks likewe must
cache tokenstreams to get back to good performance.

-Yonik
http://www.lucidimagination.com
On Sun, Aug 9, 2009 at 12:57 PM, Yonik Seeley<yo...@lucidimagination.com> wrote:
OK, I've isolated (magnified) the effect with a test I justchecked in.Indexing documents directly at the UpdateHandler was 85% fasterbefore
the latest lucene update.

Run the test like this:

ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
-Diter=100000"; grep throughput
build/test-results/*TestIndexingPerformance.xml

To run on an older trunk version, just copy over
src/test/org/apache/solr/update/TestIndexingPerformance.java
src/test/test-files/solr/conf/solrconfig_perf.xml
I had a throughput of 10946 docs/sec before the lucene update,and 5849 after.
-Yonik
http://www.lucidimagination.com
On Sun, Aug 9, 2009 at 12:10 PM, Yonik Seeley<yo...@lucidimagination.com> wrote:
On Sun, Aug 9, 2009 at 12:01 PM, Grant Ingersoll<gsing...@apache.org> wrote:
Or bite the bullet and upgrade to the incrementToken() method.
Right - I'm not sure if that would fix it or not - I haven'tbeen
involved in the new Token attribute stuff...
I'm currently writing a basic indexing unit test that we canuse to
measure this (the standard solrconfig does stuff that slows down
indexing a lot, but helps in catching bugs on edge cases bycreating
many segments).

-Yonik
http://www.lucidimagination.com
--
- Mark

http://www.lucidimagination.com


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: indexing slowdown with latest lucene udpate

Reply via email to