Hard to say. Here's the basic approach I'd use to try to narrow it down: 1> take out ngrams. What does that do to your speed? 2> are you committing very often? Lengthen the time here if so. 3> Posting is probably not the more performant thing in world. Consider using SolrJ. 4> What does a document look like? Are they structured docs (Word, PDF, etc). If so, try offloading that to client machines.
Basically, you haven't given enough information to make much of a guess here... 50 hours is a really long time for 2M docs though, so something doesn't seem right unless the docs are really unusual. If you need to offload the structured docs, here's a way to get started: http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ Best Erick On Sun, Apr 22, 2012 at 9:58 PM, neosky <neosk...@yahoo.com> wrote: > It takes me 50 hours to index a total 9 G file(about 2,000,000 documents) > with n-gram filter from min=6,max=10, my token before ngram filter is > long(not a word, at most 300,000 bytes with white space). I split into 4 > files and use the post.sh to update at the same time. I also tried to write > a lucene to do the index myself(single thread). The time is almost the same. > I would like to know what's the general bottleneck for the index in solr? > Doesn't the solr handle the index update request concurrently? > > 1. > Posting file /ngram_678910/file1.xml to http://localhost:8988/solr/update > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 51 3005M 0 0 51 1557M 0 18902 46:19:14 23:59:46 22:19:28 > 0 > 2. > Posting file /ngram_678910/file2.xml to http://localhost:8988/solr/update > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 62 2623M 0 0 62 1632M 0 19839 38:31:16 23:58:01 14:33:15 > 76629 > 3. > Posting file /ngram_678910/file3.xml to http://localhost:8988/solr/update > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 65 2667M 0 0 65 1737M 0 21113 36:48:23 23:58:06 12:50:17 > 25537 > 4. > Posting file /ngram_678910/file4.xml to http://localhost:8988/solr/update > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 58 2766M 0 0 58 1625M 0 19752 40:47:34 23:58:28 16:49:06 > 81435 > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3931338.html > Sent from the Solr - User mailing list archive at Nabble.com.