Hi Alex,

Were you ever able to get the indexing machine to go over about 1 CPU worth of 
work?  I am also curious of how BinaryRequestWriter compares to the 
StreamingUpdateSolrServer that I was using... 



Brian

-----Original Message-----
From: Alexey Serba [mailto:ase...@gmail.com] 
Sent: Tuesday, April 27, 2010 9:53 AM
To: solr-user@lucene.apache.org
Subject: Re: indexer threading?

Hi Brian,

I was testing indexing performance on a high cpu box recently and came
to the same issue. I tried different indexing methods ( xml,
CSVRequestHandler and Solrj + BinaryRequestWriter with multiple
threads ). The last method is the fastest indeed. I believe that
multiple threads approach gives you better performance if you have
complex text analysis. I had very simple analysis -
WhitespaceTokenizer only and performance boost with increasing threads
was not very impressive ( but still ). I guess that in case of simple
text analysis overall performance comes to synchronization issues.

I tried to profile application during indexing phase for CPU times and
monitors and it seems that most of blocking is on the following
methods:
- DocumentsWriter.doBalanceRAM
- DocumentsWriter.getThreadState
- SolrIndexWriter.ensureOpen

I don't know the guts of Solr/Lucene in such details so can't make any
conclusions. Are there any configuration techniques to improve
indexing performance in multiple threads scenario?

Alex

On Mon, Apr 26, 2010 at 6:52 PM, Wawok, Brian <brian.wa...@cmegroup.com> wrote:
> Hi,
>
> I was wondering about how the multi-threading of the indexer works?  I am 
> using SolrJ to stream documents to a server. As I add more threads on the 
> client side, I slowly see both speed and CPU usage go up on the indexer side. 
> Once I hit about 4 threads, my indexer is at 100% cpu usage (of 1 CPU on a 
> 4-way box), and will not do any more work. It is pretty fast, doing something 
> like 75k lines of text per second.. but I would really like to use all 4 CPUs 
> on the indexer. Is the just a limitation of Solr, or is this a limitation of 
> using SolrJ and document streaming?
>
>
> Thanks,
>
>
> Brian
>

Reply via email to