Thank you Otis.
Without trying to appear to stupid, when you refer to having the params 
matching your # of CPU cores, you are talking about the # of threads I can 
spawn with the StreamingUpdateSolrServer object?
Up until now, I have been just utilizing post.sh or post.jar. Are these capable 
of that or do I need to write some code to collect a bunch of files into the 
buffer and send it off?

Also, Do you have a sense for how long it should take to index 100,000 files or 
in my case 100,000,000 documents?
StreamingUpdateSolrServer
public StreamingUpdateSolrServer(String solrServerUrl, int queueSize, int 
threadCount) throws MalformedURLException

Thanks again,
Charlie

-- 
Best Regards,

Charles Wardell
Blue Chips Technology, Inc.
www.bcsolution.com

On Tuesday, April 26, 2011 at 5:12 PM, Otis Gospodnetic wrote: 
> Charlie,
> 
> How's this:
> * -Xmx2g
> * ramBufferSizeMB 512
> * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows)
> * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB
> * use SolrStreamingUpdateServer (with params matching your number of CPU 
> cores) 
> or send batches of say 1000 docs with the other SolrServer impl using N 
> threads 
> (N=# of your CPU cores)
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message ----
> > From: Charles Wardell <charles.ward...@bcsolution.com>
> > To: solr-user@lucene.apache.org
> > Sent: Tue, April 26, 2011 2:32:29 PM
> > Subject: Question on Batch process
> > 
> > I am sure that this question has been asked a few times, but I can't seem 
> > to 
> > find the sweetspot for indexing.
> > 
> > I have about 100,000 files each containing 1,000 xml documents ready to be 
> > posted to Solr. My desire is to have it index as quickly as possible and 
> > then 
> > once completed the daily stream of ADDs will be small in comparison.
> > 
> > The individual documents are small. Essentially web postings from the net. 
> > Title, postPostContent, date. 
> > 
> > 
> > What would be the ideal configuration? For RamBufferSize, mergeFactor, 
> > MaxbufferedDocs, etc..
> > 
> > My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP
> > I have 16GB of available ram.
> > 
> > 
> > Thanks in advance.
> > Charlie
> 

Reply via email to