Also, if we are seeing a huge cpu spike on the leader when doing a bulk
index, would changing any of the options help?


On Sat, Feb 1, 2014 at 2:59 PM, Software Dev <static.void....@gmail.com>wrote:

> Out use case is we have 3 indexing machines pulling off a kafka queue and
> they are all sending individual updates.
>
>
> On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller <markrmil...@gmail.com>wrote:
>
>> Just make sure parallel updates is set to true.
>>
>> If you want to load even faster, you can use the bulk add methods, or if
>> you need more fine grained responses, use the single add from multiple
>> threads (though bulk add can also be done via multiple threads if you
>> really want to try and push the max).
>>
>> - Mark
>>
>> http://about.me/markrmiller
>>
>> On Jan 31, 2014, at 3:50 PM, Software Dev <static.void....@gmail.com>
>> wrote:
>>
>> > Which of any of these settings would be beneficial when bulk uploading?
>> >
>> >
>> > On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller <markrmil...@gmail.com>
>> wrote:
>> >
>> >>
>> >>
>> >> On Jan 31, 2014, at 1:56 PM, Greg Walters <greg.walt...@answers.com>
>> >> wrote:
>> >>
>> >>> I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore
>> >> my response.
>> >>>
>> >>>> -updatesToLeaders
>> >>>
>> >>> Only send documents to shard leaders while indexing. This saves
>> >> cross-talk between slaves and leaders which results in more efficient
>> >> document routing.
>> >>
>> >> Right, but recently this has less of an affect because CloudSolrServer
>> can
>> >> now hash documents and directly send them to the right place. This
>> option
>> >> has become more historical. Just make sure you set the correct id
>> field on
>> >> the CloudSolrServer instance for this hashing to work (I think it
>> defaults
>> >> to "id").
>> >>
>> >>>
>> >>>> shutdownLBHttpSolrServer
>> >>>
>> >>> CloudSolrServer uses a LBHttpSolrServer behind the scenes to
>> distribute
>> >> requests (that aren't updates directly to leaders). Where did you find
>> >> this? I don't see this in the javadoc anywhere but it is a boolean in
>> the
>> >> CloudSolrServer class. It looks like when you create a new
>> CloudSolrServer
>> >> and pass it your own LBHttpSolrServer the boolean gets set to false
>> and the
>> >> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut
>> down.
>> >>>
>> >>>> parellelUpdates
>> >>>
>> >>> The javadoc's done have any description for this one but I checked out
>> >> the code for CloudSolrServer and if parallelUpdates it looks like it
>> >> executes update statements to multiple shards at the same time.
>> >>
>> >> Right, we should def add some javadoc, but this sends updates to
>> shards in
>> >> parallel rather than with a single thread. Can really increase update
>> >> speed. Still not as powerful as using CloudSolrServer from multiple
>> >> threads, but a nice improvement non the less.
>> >>
>> >>
>> >> - Mark
>> >>
>> >> http://about.me/markrmiller
>> >>
>> >>>
>> >>> I'm no dev but I can read so please excuse any errors on my part.
>> >>>
>> >>> Thanks,
>> >>> Greg
>> >>>
>> >>> On Jan 31, 2014, at 11:40 AM, Software Dev <static.void....@gmail.com
>> >
>> >> wrote:
>> >>>
>> >>>> Can someone clarify what the following options are:
>> >>>>
>> >>>> - updatesToLeaders
>> >>>> - shutdownLBHttpSolrServer
>> >>>> - parallelUpdates
>> >>>>
>> >>>> Also, I remember in older version of Solr there was an efficient
>> format
>> >>>> that was used between SolrJ and Solr that is more compact. Does this
>> >> sill
>> >>>> exist in the latest version of Solr? If so, is it the default?
>> >>>>
>> >>>> Thanks
>> >>>
>> >>
>> >>
>>
>>
>

Reply via email to