Hi Walter, can you explain better your use case ? You index a batch of e-commerce products ( Solr documents) if one fails, you want to stop and invalidate the entire batch ( using the almost never used solr rollback, or manual deletion ?) And then log the exception indexing size. To then re-index the whole batch od docs ?
In this scenario, the ConcurrentUpdateSolrClient will not be ideal? Only curiosity. Cheers On 6 October 2015 at 17:29, Walter Underwood <wun...@wunderwood.org> wrote: > It depends on the document. In a e-commerce search, you might want to fail > immediately and be notified. That is what we do, fail, rollback, and notify. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > > mmmmmm one broken document in a batch should not break the entire batch , > > right ( whatever approach used) ? > > Are you referring to the fact that you want to programmatically re-index > > the broken docs ? > > > > Would be interesting to return the id of the broken docs along with the > > solr update response! > > > > Cheers > > > > > > On 6 October 2015 at 15:30, Bill Dueber <b...@dueber.com> wrote: > > > >> Just to add...my informal tests show that batching has waaaaay more > effect > >> than solrj vs json. > >> > >> I haven't look at CUSC in a while, last time I looked it was impossible > to > >> do anything smart about error handling, so check that out before you get > >> too deeply into it. We use a strategy of sending a batch of json > documents, > >> and if it returns an error sending each record one at a time until we > find > >> the bad one and can log something useful. > >> > >> > >> > >> On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti < > >> benedetti.ale...@gmail.com> wrote: > >> > >>> Thanks Erick, > >>> you confirmed my impressions! > >>> Thank you very much for the insights, an other opinion is welcome :) > >>> > >>> Cheers > >>> > >>> 2015-10-05 14:55 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > >>> > >>>> SolrJ tends to be faster for several reasons, not the least of which > >>>> is that it sends packets to Solr in a more efficient binary format. > >>>> > >>>> Batching is critical. I did some rough tests using SolrJ and sending > >>>> docs one at a time gave a throughput of < 400 docs/second. > >>>> Sending 10 gave 2,300 or so. Sending 100 at a time gave > >>>> over 5,300 docs/second. Curiously, 1,000 at a time gave only > >>>> marginal improvement over 100. This was with a single thread. > >>>> YMMV of course. > >>>> > >>>> CloudSolrClient is definitely the better way to go with SolrCloud, > >>>> it routes the docs to the correct leader instead of having the > >>>> node you send the docs to do the routing. > >>>> > >>>> Best, > >>>> Erick > >>>> > >>>> On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti > >>>> <abenede...@apache.org> wrote: > >>>>> I was doing some studies and analysis, just wondering in your opinion > >>>> which > >>>>> one is the best approach to use to index in Solr to reach the best > >>>>> throughput possible. > >>>>> I know that a lot of factor are affecting Indexing time, so let's > >> only > >>>>> focus in the feeding approach. > >>>>> Let's isolate different scenarios : > >>>>> > >>>>> *Single Solr Infrastructure* > >>>>> > >>>>> 1) Xml/Json batch request to /update IndexHandler (xml/json) > >>>>> > >>>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin) > >>>>> I was thinking this to be the fastest approach for a multi threaded > >>>>> indexing application. > >>>>> Posting batch of docs if possible per request. > >>>>> > >>>>> *Solr Cloud* > >>>>> > >>>>> 1) Xml/Json batch request to /update IndexHandler(xml/json) > >>>>> > >>>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin) > >>>>> > >>>>> 3) CloudSolrClient ( javabin) > >>>>> it seems the best approach accordingly to this improvements [1] > >>>>> > >>>>> What are your opinions ? > >>>>> > >>>>> A bonus observation should be for using some Map/Reduce big data > >>> indexer, > >>>>> but let's assume we don't have a big cluster of cpus, but the average > >>>>> Indexer server. > >>>>> > >>>>> > >>>>> [1] > >>>>> > >>>> > >>> > >> > https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ > >>>>> > >>>>> > >>>>> Cheers > >>>>> > >>>>> > >>>>> -- > >>>>> -------------------------- > >>>>> > >>>>> Benedetti Alessandro > >>>>> Visiting card : http://about.me/alessandro_benedetti > >>>>> > >>>>> "Tyger, tyger burning bright > >>>>> In the forests of the night, > >>>>> What immortal hand or eye > >>>>> Could frame thy fearful symmetry?" > >>>>> > >>>>> William Blake - Songs of Experience -1794 England > >>>> > >>> > >>> > >>> > >>> -- > >>> -------------------------- > >>> > >>> Benedetti Alessandro > >>> Visiting card - http://about.me/alessandro_benedetti > >>> Blog - http://alexbenedetti.blogspot.co.uk > >>> > >>> "Tyger, tyger burning bright > >>> In the forests of the night, > >>> What immortal hand or eye > >>> Could frame thy fearful symmetry?" > >>> > >>> William Blake - Songs of Experience -1794 England > >>> > >> > >> > >> > >> -- > >> Bill Dueber > >> Library Systems Programmer > >> University of Michigan Library > >> > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card - http://about.me/alessandro_benedetti > > Blog - http://alexbenedetti.blogspot.co.uk > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England