On 12/7/23 07:56, Vince McMahon wrote:
{
"responseHeader": {
"status": 0,
"QTime": 0
},
"initArgs": [
"defaults",
[
"config",
"db-data-config.xml"
]
],
"command": "status",
"status": "idle",
"importResponse": "",
"statusMessages": {
"Total Requests made to DataSource": "1",
"Total Rows Fetched": "915000",
"Total Documents Processed": "915000",
"Total Documents Skipped": "0",
"Full Dump Started": "2023-12-07 02:54:29",
"": "Indexing completed. Added/Updated: 915000 documents. Deleted
0 documents.",
"Committed": "2023-12-07 02:54:51",
"Time taken": "0:0:21.831"
}
}
There's no way Solr can index 915000 docs in 21 seconds without a LOT of
threads in the indexing program, and DIH is single-threaded. As you've
already noted, it didn't actually index most of the documents. I don't
have an answer as to why it didn't work.
DIH lacks decent logging, error handling, and multi-threading. It is
not the most reliable way to index. This is why it was deprecated a
while back and then removed from 9.x. You would be far better off
writing your own indexing program rather than using DIH.
I have an idea for a multi-threaded database->solr indexing program, but
haven't had much time to spend on it. If I can ever get it done, it
will be freely available.
On the entity, "rows" is not a valid attribute. To control how many DB
rows are fetched at a time, set batchSize on the dataSource element.
The default batchSize is 500.
Thanks,
Shawn