We are also facing the same problem in loading 14 Billion documents into Solr
4.8.10.
Dataimport is working in Single threaded, which is taking more than 3 weeks.
This is working fine without any issues but it takes months to complete the
load.
When we tried SolrJ with the below configuration in Multithreaded load, the
Solr is taking more memory & at one point we will end up in out of memory as
well.
Batch Doc count : 100000 docs
No of Threads : 16/32
Solr Memory Allocated : 200 GB
The reason can be as below.
Solr is taking the snapshot, whenever we open a SearchIndexer.
Due to this more memory is getting consumed & solr is extremely slow
while running 16 or more threads for loading.
If anyone have already done the multithreaded data load into Solr in a quicker
way, Can you please share the code or logic in using the SolrJ API?
Thanks in advance.
Regards,
Suresh.A
-----Original Message-----
From: Dyer, James [mailto:[email protected]]
Sent: Tuesday, February 03, 2015 1:58 PM
To: [email protected]
Subject: RE: Solr 4.9 Calling DIH concurrently
DIH is single-threaded. There was once a threaded option, but it was buggy and
subsequently was removed.
What I do is partition my data and run multiple dih request handlers at the
same time. It means redundant sections in solrconfig.xml and its not very
elegant but it works.
For instance, for a sql query, I add something like this: "where mod(id,
${dataimporter.request.numPartitions})=${dataimporter.request.currentPartition}".
I think, though, most users who want to make the most out of multithreading
write their own program and use the solrj api to send the updates.
James Dyer
Ingram Content Group
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, February 03, 2015 3:43 PM
To: [email protected]
Subject: Solr 4.9 Calling DIH concurrently
Hi
I am using solr 4.9 and need to index million of documents from database. I am
using DIH and sending request to fetch by ids. Is there a way to run multiple
indexing threads, concurrently in DIH.
I want to take advantage of
<maxIndexingThreads>
parameter. How do I do it. I am just invoking DIH handler using solrj
HttpSolrServer.
And issue requests sequentially.
http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=100&minId=1
http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=201&minId=101
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744.html
Sent from the Solr - User mailing list archive at Nabble.com.