ons?
>>>>>>>
>>>>>>>
>>>>>>> On 24 December 2014 at 22:29, Abhishek wrote:
>>>>>>>
>>>>>>>> Thanks for reading vineeth. That was my initial thought but I
>>>>>>>> cou
hek wrote:
>>>>>>
>>>>>>> Thanks for reading vineeth. That was my initial thought but I
>>>>>>> couldn't find any old gc during the outage. Each es node has 32 gigs.
>>>>>>> Each
>>>>>>> bo
ge. Each es node has 32 gigs.
>>>>>> Each
>>>>>> box has 128gigs split between 2 es nodes(32G each) and file system cache
>>>>>> (64G).
>>>>>>
>>>>>> On Wed, Dec 24, 2014 at 4:49 PM, vineeth mohan
>>>>>
uring the outage. Each es node has 32 gigs. Each box
>>>>>> has 128gigs split between 2 es nodes(32G each) and file system cache
>>>>>> (64G).
>>>>>>
>>>>>>> On Wed, Dec 24, 2014 at 4:49 PM, vineeth mohan
>>>&g
gt; What is the memory for each of these machines ?
>>>>>> Also see if there is any correlation between garbage collection and
>>>>>> the time this anomaly happens.
>>>>>> Chances are that the stop the world time might block the ping for
>>>
h mohan
>>>> wrote:
>>>>
>>>>> Hi ,
>>>>>
>>>>> What is the memory for each of these machines ?
>>>>> Also see if there is any correlation between garbage collection and
>>>>> the time this anomaly happens
>>>> time this anomaly happens.
>>>> Chances are that the stop the world time might block the ping for
>>>> sometime and the cluster might feel some nodes are gone.
>>>>
>>>> Thanks
>>>> Vineeth
>>>>
>>>
t;>> Chances are that the stop the world time might block the ping for
>>> sometime and the cluster might feel some nodes are gone.
>>>
>>> Thanks
>>> Vineeth
>>>
>>> On Wed, Dec 24, 2014 at 4:23 PM, Abhishek Andhavarapu <
>
gt;> time this anomaly happens.
>>> Chances are that the stop the world time might block the ping for
>>> sometime and the cluster might feel some nodes are gone.
>>>
>>> Thanks
>>> Vineeth
>>>
>>> On Wed, Dec 24, 2014 at
for
>> sometime and the cluster might feel some nodes are gone.
>>
>> Thanks
>> Vineeth
>>
>> On Wed, Dec 24, 2014 at 4:23 PM, Abhishek Andhavarapu <
>> abhishek...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> We recently
> Thanks
> Vineeth
>
> On Wed, Dec 24, 2014 at 4:23 PM, Abhishek Andhavarapu <
> abhishek...@gmail.com> wrote:
>
>> Hi all,
>>
>> We recently had a cascading cluster failure. From 16:35 to 16:42 the
>> cluster went red and recovered it self. I c
Vineeth
On Wed, Dec 24, 2014 at 4:23 PM, Abhishek Andhavarapu wrote:
> Hi all,
>
> We recently had a cascading cluster failure. From 16:35 to 16:42 the
> cluster went red and recovered it self. I can't seem to find any obvious
> logs around this time.
>
> The cluster has
Hi all,
We recently had a cascading cluster failure. From 16:35 to 16:42 the
cluster went red and recovered it self. I can't seem to find any obvious
logs around this time.
The cluster has about 19 nodes. 9 physical boxes running two instances of
elasticsearch. And one vm as balance
I just had an incident where my entire cluster (all nodes) ended up using
100% cpu on each nod at the same time and become completely unresponsive
even to /_cluster/health. This happened while I was using Kibana, which was
working fine up to that point. I was running a few simple queries (nothin
Just another update since there have been others that had issues with
multicast in the past and switched to unicast.
My issue appears to be with the multicast group. The default in
Elasticsearch is 224.2.2.4, which according to the RFC is within the SDP/SAP
Block. Our internal application uses mDN
Don't bother trying digging deeper since I suspect network.
I tried many different configurations while trying to pinpoint the problem,
so I did not write down the various states, just the successes/failures.
Using the described methods, IPV4 was indeed working, but multicast was
still not coopera
Nice post-mortem, thanks for the writeup. Hopefully someone will stumble
on this in the future and avoid the same headache you had :)
How would you force IPV4? I tried using preferIPv4Stack and setting
> network.host to _eth0:ipv4_, but it still did not work. Even switched off
> iptables at a
Responses inline.
On Wed, Mar 19, 2014 at 7:25 PM, Zachary Tong wrote:
> Yeah, in case anyone reads this thread in the future, this log output is a
> good indicator of multicast problems. You can see that the the nodes are
> pinging and talking to each other on this log line:
>
> --> target [[se
Yeah, in case anyone reads this thread in the future, this log output is a
good indicator of multicast problems. You can see that the the nodes are
pinging and talking to each other on this log line:
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
Th
My mind was not clear since I was debugging this issue for a few hours.
Once I realized it was a multicast issue, I switched to unicast and the
cluster was back up and running. So it was multicast after all. I should
have been more careful when I received an email on Friday that said
" will have to
How many NIC are there on each of your nodes? We got some issue on boxes
with 4 NIC, some address were not reachable due to linux kernel setting.
I'd suggest you test the full connection matrix via some shell script, so
as to rule out this cause.
My 2 cents
--
You received this message because yo
No matter in what order I restart the servers, the same 4 node clusters get
created. I suspect network, especially since there was some work done this
past Friday on the underlying VM host. Would Elasticsearch cache multicast
information? The servers have not been restarted in at least a week.
Iva
I have been running Elasticsearch for years and I have never encountered a
collapse such as the one I am experiencing. Even when experiencing split
brain clusters, I still had it running and accepting search requests.
8 node development cluster running 0.90.2 using multicast. Last time the
cluster
23 matches
Mail list logo