Re: MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-09-22 Thread rulinma
-D 'mapred.child.java.opts=-Xmx500m' set this as your needs. I think it will work well. -- View this message in context: http://lucene.472066.n3.nabble.com/MergeReduceIndexerTool-takes-a-lot-of-time-for-a-limited-number-of-documents-tp4138163p4160362.html Sent from the Solr - User mailing

Re: MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-06-18 Thread Wolfgang Hoschek
Consider giving the MR tasks more RAM, for example via hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D 'mapred.child.java.opts=-Xmx2000m’ ... Wolfgang. On May 26, 2014, at 10:48 AM, Costi Muraru costimur...@gmail.com

MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-05-26 Thread Costi Muraru
Hey guys, I'm using the MergeReduceIndexerTool to import data into a SolrCloud cluster made out of 3 decent machines. Looking in the JobTracker, I can see that the mapper jobs finish quite fast. The reduce jobs get to ~80% quite fast as well. It is here where they get stucked for a long period of

Re: MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-05-26 Thread Erick Erickson
The MapReduceIndexerTool is really intended for very large data sets, and by today's standards 80K doesn't qualify :). Basically, MRIT creates N sub-indexes, then merges them, which it may to in a tiered fashion. That is, it may merge gen1 to gen2, then merge gen2 to gen3 etc. Which is great when

Re: MergeReduceIndexerTool takes a lot of time for a limited number of documents

2014-05-26 Thread Costi Muraru
Hey Erick, The job reducers began to die with Error: Java heap space, after 1h and 22 minutes being stucked at ~80%. I did a few more tests: Test 1. 80,000 documents Each document had *20* fields. The field names were* the same *for all the documents. Values were different. Job status: