Best Indexing Approaches - To max the throughput

Alessandro Benedetti Mon, 05 Oct 2015 04:58:32 -0700

I was doing some studies and analysis, just wondering in your opinion which
one is the best approach to use to index in Solr to reach the best
throughput possible.
I know that a lot of factor are affecting Indexing time, so let's only
focus in the feeding approach.
Let's isolate different scenarios :


*Single Solr Infrastructure*

1) Xml/Json batch request to /update IndexHandler (xml/json)

2) SolrJ ConcurrentUpdateSolrClient ( javabin)
I was thinking this to be the fastest approach for a multi threaded
indexing application.
Posting batch of docs if possible per request.

*Solr Cloud*

1) Xml/Json batch request to /update IndexHandler(xml/json)

2) SolrJ ConcurrentUpdateSolrClient ( javabin)

3) CloudSolrClient ( javabin)
it seems the best approach accordingly to this improvements [1]

What are your opinions ?

A bonus observation should be for using some Map/Reduce big data indexer,
but let's assume we don't have a big cluster of cpus, but the average
Indexer server.


[1]
https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/


Cheers


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Best Indexing Approaches - To max the throughput

Reply via email to