I was doing some studies and analysis, just wondering in your opinion which one is the best approach to use to index in Solr to reach the best throughput possible. I know that a lot of factor are affecting Indexing time, so let's only focus in the feeding approach. Let's isolate different scenarios :
*Single Solr Infrastructure* 1) Xml/Json batch request to /update IndexHandler (xml/json) 2) SolrJ ConcurrentUpdateSolrClient ( javabin) I was thinking this to be the fastest approach for a multi threaded indexing application. Posting batch of docs if possible per request. *Solr Cloud* 1) Xml/Json batch request to /update IndexHandler(xml/json) 2) SolrJ ConcurrentUpdateSolrClient ( javabin) 3) CloudSolrClient ( javabin) it seems the best approach accordingly to this improvements [1] What are your opinions ? A bonus observation should be for using some Map/Reduce big data indexer, but let's assume we don't have a big cluster of cpus, but the average Indexer server. [1] https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ Cheers -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England