Thank you Otis. Without trying to appear to stupid, when you refer to having the params matching your # of CPU cores, you are talking about the # of threads I can spawn with the StreamingUpdateSolrServer object? Up until now, I have been just utilizing post.sh or post.jar. Are these capable of that or do I need to write some code to collect a bunch of files into the buffer and send it off?
Also, Do you have a sense for how long it should take to index 100,000 files or in my case 100,000,000 documents? StreamingUpdateSolrServer public StreamingUpdateSolrServer(String solrServerUrl, int queueSize, int threadCount) throws MalformedURLException Thanks again, Charlie -- Best Regards, Charles Wardell Blue Chips Technology, Inc. www.bcsolution.com On Tuesday, April 26, 2011 at 5:12 PM, Otis Gospodnetic wrote: > Charlie, > > How's this: > * -Xmx2g > * ramBufferSizeMB 512 > * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) > * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB > * use SolrStreamingUpdateServer (with params matching your number of CPU > cores) > or send batches of say 1000 docs with the other SolrServer impl using N > threads > (N=# of your CPU cores) > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- > > From: Charles Wardell <charles.ward...@bcsolution.com> > > To: solr-user@lucene.apache.org > > Sent: Tue, April 26, 2011 2:32:29 PM > > Subject: Question on Batch process > > > > I am sure that this question has been asked a few times, but I can't seem > > to > > find the sweetspot for indexing. > > > > I have about 100,000 files each containing 1,000 xml documents ready to be > > posted to Solr. My desire is to have it index as quickly as possible and > > then > > once completed the daily stream of ADDs will be small in comparison. > > > > The individual documents are small. Essentially web postings from the net. > > Title, postPostContent, date. > > > > > > What would be the ideal configuration? For RamBufferSize, mergeFactor, > > MaxbufferedDocs, etc.. > > > > My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP > > I have 16GB of available ram. > > > > > > Thanks in advance. > > Charlie >