StreamingServer adds docs in multiple threads using the same http connection

Or

you can use CommonsHttpSolrServer#add(Iterator<SolrInputDocument> docIterator)
method

if you are unhappy w/ the perf you can use the BinaryRequestWriter
http://wiki.apache.org/solr/Solrj#head-ddc28af4033350481a3cbb27bc1d25bffd801af0

if you still need more perf you can call the add method in multiple threads



On Thu, Jul 2, 2009 at 3:20 AM, Manepalli,
Kalyan<kalyan.manepa...@orbitz.com> wrote:
> By removing both the stopwordfilterFactory and SynonymfilterFactory, the 
> indexing time per doc has reduced drastically to 2 to 5 ms per doc.
> Next I will try out StreamingServer. Any distinct advantages of using 
> StreamingServer
>
> Thanks,
> Kalyan Manepalli
>
> -----Original Message-----
> From: Manepalli, Kalyan [mailto:kalyan.manepa...@orbitz.com]
> Sent: Wednesday, July 01, 2009 3:41 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Tips on speeding the indexing process
>
> Regarding the analysis, we do couple of things during indexing. First is use 
> a dictionary text file for stopword filter factory. Secondly we use synonym 
> text file for SynonymfilterFactory. I will test the indexing speed by 
> temporarily removing both of them.
>
> Thanks,
> Kalyan Manepalli
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Wednesday, July 01, 2009 3:31 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Tips on speeding the indexing process
>
>
> Kalyan,
>
> 150/200 ms per 1 document to index seems too long, but it really depends on 
> how much analysis is going on and size of docs.  32 threads seems too high, 
> unless your Solr server really has 32 cores.
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: "Manepalli, Kalyan" <kalyan.manepa...@orbitz.com>
>> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>> Sent: Wednesday, July 1, 2009 4:21:30 PM
>> Subject: RE: Tips on speeding the indexing process
>>
>> Here are some specs for my indexer.
>> Indexer is custom Java code that reads data from DB and other services builds
>> the solrDocument and submits it using SolrJ via Http. Indexer is doing a bit 
>> of
>> work for building the documents. The overhead is around 30 to 40ms. For every
>> document addition solr takes around 150 to 200 ms.
>> I tried the bulk addition approach with 1000 documents at time. But found out
>> that solr just take the same amount of time. I commit and optimize only once 
>> at
>> the end. I currently use 32 threads in production environment to get that 
>> speed
>> of 2hrs.
>>
>>
>> Thanks,
>> Kalyan Manepalli
>>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> Sent: Wednesday, July 01, 2009 3:11 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Tips on speeding the indexing process
>>
>>
>> Kalyan,
>>
>> Using SolrJ?  Use the StreamingServer, it's nice and fast.
>> Alternatively, start multiple indexing threads (match the number of Solr 
>> server
>> CPU cores) and index from there.
>> Send batches of docs, not one by one.
>> Don't commit or optimize until you are done.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>> > From: "Manepalli, Kalyan"
>> > To: "solr-user@lucene.apache.org"
>> > Sent: Wednesday, July 1, 2009 3:42:45 PM
>> > Subject: Tips on speeding the indexing process
>> >
>> > Hi,
>> >             I have a very generic question regarding indexing. In my 
>> > current
>> > app, I have about 450,000 docs each doc size around 2k. The total indexing
>> time
>> > is around 2hrs.
>> > Now due to multi language support, the number of documents is increasing to
>> 2.0
>> > million. The total indexing time is exceeding 6 hrs.
>> > I wanted to know if there are any general tips to speedup the indexing
>> process.
>> >
>> > Thanks,
>> > Kalyan Manepalli
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Reply via email to