Re: Need Support - Apache Solr - 20180915

Shawn Heisey Sat, 15 Sep 2018 10:23:55 -0700

On 9/14/2018 9:51 PM, senthil wrote:

We are beginners to Apache Solr and its implementations. We need thefollowing basic clarifications regarding Apache Solr usage andimplementing with MS-SQL server database.

I don't know what you think of Erick's answers, but he's right on themoney with everything he said. Here's my contribution. Just more detail.

1. Our MS-SQL server database having the data table which contains 20columns with billions of data.

MS SQL probably means your environment is Windows Server. If you can,run Solr on something other than Windows. Solr can run just fine on aServer edition of Windows, but it will run better on something else. Open source operating systems will serve you very well.

2. How to implement Apache Solr in the particular above table toincrease search capability?

Setting Solr up to import from a database is not terribly difficult. Where you will probably spend the most time is perfecting your fieldanalysis in your schema. Getting that right can take a lot ofexperimentation, rebuilding the index every time you change something. You probably don't want to import your whole database table every timewhile you work on this step.

There are certain gotchas when using the DataImport Handler withSolrCloud. You'll be happier with Solr if you can build your ownprogram to transfer data from your database into Solr. With amulti-threaded indexing application, you can achieve import speeds fargreater than DIH can.

3. Is there any way to call the data which is distributed across 2shards/node of Apache Solr at a time?

As Erick said, this is where SolrCloud shines. You can do shardedindexes without SolrCloud, but it is much more difficult to manage.

4. Is there any performance difference between search the data in asingle shard/node and multiple shard/node?

I'm not sure how to approach this question - mostly because I cannottell exactly what you're asking. Are you asking about multiple shardsper node, or multiple shards in general?

The short answer is yes in either case. And if all you want to know iswhether a performance difference EXISTS, then the answer is yes. Thelong answer, like MANY questions about Solr, is "it depends." If, inaddition to whether a performance difference exists, you want to knowwhich way has better performance, the answer is still "it depends."

If your query rate is VERY low, splitting into multiple shards on thesame node can actually perform BETTER than a single node on the samemachine. As your query rate grows, you'll want those shards to be onseparate machines, or query performance will suffer.


Thanks,
Shawn

Re: Need Support - Apache Solr - 20180915

Reply via email to