On 11/29/2020 10:32 AM, Erick Erickson wrote:
And I absolutely agree with Walter that the DB is often where
the bottleneck lies. You might be able to
use multiple threads and/or processes to query the
DB if that’s the case and you can find some kind of partition
key.
IME the difficult part has
If you like Java instead of Python, here’s a skeletal program:
https://lucidworks.com/post/indexing-with-solrj/
It’s simple and single-threaded, but could serve as a basis for
something along the lines that Walter suggests.
And I absolutely agree with Walter that the DB is often where
the bottle
I recommend building an outboard loader, like I did a dozen years ago for
Solr 1.3 (before DIH) and did again recently. I’m glad to send you my Python
program, though it reads from a JSONL file, not a database.
Run a loop fetching records from a database. Put each record into a synchronized
(threa