On Wed, Jun 13, 2012 at 4:38 PM, Benson Margulies <bimargul...@gmail.com> wrote: > > Does this suggest anything to anyone? Other than that we've > misanalyzed the logic in the tokenizer and there's a way to make it > burp on one thread?
it might suggest the different tokenstream instances refer to some shared object that is not thread safe: we had bugs like this before (e.g. sharing a JDK collator is ok, but ICU ones are not thread-safe, so you must clone them). Because of this we beefed up our base analysis class (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java) to find thread safety bugs like this. I recommend just grabbing the test-framework.jar (we release it as an artifact), extend that class and write a test like: public void testRandomStrings() throws Exception { checkRandomData(random, analyzer, 100000); } (or use the one in the branch, its even been improved since 3.6) -- lucidimagination.com