Re: Solr Cloud Bulk Indexing Questions

Software Dev Thu, 23 Jan 2014 10:37:53 -0800

Also, any suggestions on debugging? What should I look for and how? Thanks


On Thu, Jan 23, 2014 at 10:01 AM, Software Dev <static.void....@gmail.com>wrote:

> Thanks for suggestions. After reading that document I feel even more
> confused though because I always thought that hard commits should be less
> frequent that hard commits.
>
> Is there any way to configure autoCommit, softCommit values on a per
> request basis? The majority of the time we have small flow of updates
> coming in and we would like to see them in ASAP. However we occasionally
> need to do some bulk indexing (once a week or less) and the need to see
> those updates right away isn't as critical.
>
> I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
> and the other 5% is "Index-Heavy Query-Light/Heavy" mode.
>
> Thanks
>
>
> On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> When you're doing hard commits, is it with openSeacher = true or
>> false? It should probably be false...
>>
>> Here's a rundown of the soft/hard commit consequences:
>>
>>
>> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> I suspect (but, of course, can't prove) that you're over-committing
>> and hitting segment
>> merges without meaning to...
>>
>> FWIW,
>> Erick
>>
>> On Wed, Jan 22, 2014 at 1:46 PM, Software Dev <static.void....@gmail.com>
>> wrote:
>> > A suggestion would be to hard commit much less often, ie every 10
>> > minutes, and see if there is a change.
>> >
>> > - Will try this
>> >
>> > How much system RAM ? JVM Heap ? Enough space in RAM for system disk
>> cache ?
>> >
>> > - We have 18G of ram 12 dedicated to Solr but as of right now the total
>> > index size is only 5GB
>> >
>> > Ah, and what about network IO ? Could that be a limiting factor ?
>> >
>> > - What is the size of your documents ? A few KB, MB, ... ?
>> >
>> > Under 1MB
>> >
>> > - Again, total index size is only 5GB so I dont know if this would be a
>> > problem
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
>> > <andre.b...@kelkoo.com>wrote:
>> >
>> >> 1 node having more load should be the leader (because of the extra work
>> >> of receiving and distributing updates, but my experiences show only a
>> >> bit more CPU usage, and no difference in disk IO).
>> >>
>> >> A suggestion would be to hard commit much less often, ie every 10
>> >> minutes, and see if there is a change.
>> >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk
>> cache
>> >> ?
>> >> What is the size of your documents ? A few KB, MB, ... ?
>> >> Ah, and what about network IO ? Could that be a limiting factor ?
>> >>
>> >>
>> >> André
>> >>
>> >>
>> >> On 2014-01-21 23:40, Software Dev wrote:
>> >>
>> >>> Any other suggestions?
>> >>>
>> >>>
>> >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <
>> static.void....@gmail.com>
>> >>> wrote:
>> >>>
>> >>>  4.6.0
>> >>>>
>> >>>>
>> >>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <markrmil...@gmail.com
>> >>>> >wrote:
>> >>>>
>> >>>>  What version are you running?
>> >>>>>
>> >>>>> - Mark
>> >>>>>
>> >>>>> On Jan 20, 2014, at 5:43 PM, Software Dev <
>> static.void....@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes.
>> Do
>> >>>>>> all
>> >>>>>> updates get sent to one machine or something?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>> >>>>>>
>> >>>>> static.void....@gmail.com>wrote:
>> >>>>>
>> >>>>>> We commit have a soft commit every 5 seconds and hard commit every
>> 30.
>> >>>>>>>
>> >>>>>> As
>> >>>>>
>> >>>>>> far as docs/second it would guess around 200/sec which doesn't seem
>> >>>>>>>
>> >>>>>> that
>> >>>>>
>> >>>>>> high.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>> >>>>>>>
>> >>>>>> erickerick...@gmail.com>wrote:
>> >>>>>
>> >>>>>> Questions: How often do you commit your updates? What is your
>> >>>>>>>> indexing rate in docs/second?
>> >>>>>>>>
>> >>>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If
>> the
>> >>>>>>>> server is having trouble keeping up with updates, switching to
>> CUSS
>> >>>>>>>> probably wouldn't help.
>> >>>>>>>>
>> >>>>>>>> So I suspect there's something not optimal about your setup
>> that's
>> >>>>>>>> the culprit.
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Erick
>> >>>>>>>>
>> >>>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>> >>>>>>>>
>> >>>>>>> static.void....@gmail.com>
>> >>>>>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> We are testing our shiny new Solr Cloud architecture but we are
>> >>>>>>>>> experiencing some issues when doing bulk indexing.
>> >>>>>>>>>
>> >>>>>>>>> We have 5 solr cloud machines running and 3 indexing machines
>> >>>>>>>>>
>> >>>>>>>> (separate
>> >>>>>
>> >>>>>> from the cloud servers). The indexing machines pull off ids from a
>> >>>>>>>>>
>> >>>>>>>> queue
>> >>>>>
>> >>>>>> then they index and ship over a document via a CloudSolrServer. It
>> >>>>>>>>>
>> >>>>>>>> appears
>> >>>>>>>>
>> >>>>>>>>> that the indexers are too fast because the load (particularly
>> disk
>> >>>>>>>>>
>> >>>>>>>> io)
>> >>>>>
>> >>>>>> on
>> >>>>>>>>
>> >>>>>>>>> the solr cloud machines spikes through the roof making the
>> entire
>> >>>>>>>>>
>> >>>>>>>> cluster
>> >>>>>>>>
>> >>>>>>>>> unusable. It's kind of odd because the total index size is not
>> even
>> >>>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I
>> could
>> >>>>>>>>>
>> >>>>>>>> try
>> >>>>>
>> >>>>>> to
>> >>>>>>>>
>> >>>>>>>>> help alleviate these problems?
>> >>>>>>>>>
>> >>>>>>>>> I should note that for the above collection we have only have 1
>> >>>>>>>>> shard
>> >>>>>>>>>
>> >>>>>>>> thats
>> >>>>>>>>
>> >>>>>>>>> replicated across all machines so all machines have the full
>> index.
>> >>>>>>>>>
>> >>>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer
>> >>>>>>>>> where
>> >>>>>>>>>
>> >>>>>>>> all
>> >>>>>>>>
>> >>>>>>>>> updates get sent to 1 machine and 1 machine only? We could then
>> >>>>>>>>>
>> >>>>>>>> remove
>> >>>>>
>> >>>>>> this
>> >>>>>>>>
>> >>>>>>>>> machine from our cluster than that handles user requests.
>> >>>>>>>>>
>> >>>>>>>>> Thanks for any input.
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>> --
>> >>> André Bois-Crettez
>> >>>
>> >>> Software Architect
>> >>> Search Developer
>> >>> http://www.kelkoo.com/
>> >>>
>> >>
>> >> Kelkoo SAS
>> >> Société par Actions Simplifiée
>> >> Au capital de € 4.168.964,30
>> >> Siège social : 8, rue du Sentier 75002 Paris
>> >> 425 093 069 RCS Paris
>> >>
>> >> Ce message et les pièces jointes sont confidentiels et établis à
>> >> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
>> >> destinataire de ce message, merci de le détruire et d'en avertir
>> >> l'expéditeur.
>> >>
>>
>
>

Re: Solr Cloud Bulk Indexing Questions

Reply via email to