Re: SOLR indexing takes longer time

2020-08-18 Thread Walter Underwood
Instead of writing code, I’d fire up SQL Workbench/J, load the same JDBC driver that is being used in Solr, and run the query. https://www.sql-workbench.eu If that takes 3.5 hours, you have isolated the problem. wunder Walter Underwood wun...@wunderwood.org http:/

Re: SOLR indexing takes longer time

2020-08-18 Thread David Hastings
Another thing to mention is to make sure the indexer you build doesnt send commits until its actually done. Made that mistake with some early in house indexers. On Tue, Aug 18, 2020 at 9:38 AM Charlie Hull wrote: > 1. You could write some code to pull the items out of Mongo and dump > them to d

Re: SOLR indexing takes longer time

2020-08-18 Thread Charlie Hull
1. You could write some code to pull the items out of Mongo and dump them to disk - if this is still slow, then it's Mongo that's the problem. 2. Write a standalone indexer to replace DIH, it's single threaded and deprecated anyway. 3. Minor point - consider whether you need to index everything e

Re: SOLR indexing takes longer time

2020-08-17 Thread Aroop Ganguly
Adding on to what others have said, indexing speed in general is largely affected by the parallelism and isolation you can give to each node. Is there a reason why you cannot have more than 1 shard? If you have 5 node cluster, why not have 5 shards, maxshardspernode=1 replica=1 is ok. You should

Re: SOLR indexing takes longer time

2020-08-17 Thread Shawn Heisey
On 8/17/2020 12:22 PM, Abhijit Pawar wrote: We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / replicas and just single core. It takes almost 3.5 hours to index that data. I am using a data import handler to import data from the mongo database. Is there something we can do t

Re: SOLR indexing takes longer time

2020-08-17 Thread Walter Underwood
I’m seeing multiple red flags for performance here. The top ones are “DIH”, “MongoDB”, and “SQL on MongoDB”. MongoDB is not a relational database. Our multi-threaded extractor using the Mongo API was still three times slower than the same approach on MySQL. Check the CPU usage on the Solr hosts w

Re: SOLR indexing takes longer time

2020-08-17 Thread Abhijit Pawar
Sure Divye, *Here's the config.* *conf/solr-config.xml:* /home/ec2-user/solr/solr-5.4.1/server/solr/test_core/conf/dataimport/data-source-config.xml *schema.xml:* has of all the field definitions *conf/dataimport/data-source-config.xml* . . . 4-5 more nested entities..

Re: SOLR indexing takes longer time

2020-08-17 Thread Jörn Franke
The DIH is single threaded and deprecated. Your best bet is to have a script/program extracting data from MongoDB and write them to Solr in Batches using multiple threads. You will see a significant higher performance for your data. > Am 17.08.2020 um 20:23 schrieb Abhijit Pawar : > > Hello,

Re: SOLR indexing takes longer time

2020-08-17 Thread Divye Handa
Can you share the dih configuration you are using for same? On Mon, 17 Aug, 2020, 23:52 Abhijit Pawar, wrote: > Hello, > > We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / > replicas and just single core. > It takes almost 3.5 hours to index that data. > I am using a data

SOLR indexing takes longer time

2020-08-17 Thread Abhijit Pawar
Hello, We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / replicas and just single core. It takes almost 3.5 hours to index that data. I am using a data import handler to import data from the mongo database. Is there something we can do to reduce the time taken to index? Will