okay, thank you

On Wed, Aug 29, 2018 at 11:04 PM Jeff Jirsa <jji...@gmail.com> wrote:

> You’re seeing an OOM, not a socket error / timeout.
>
> --
> Jeff Jirsa
>
>
> On Aug 29, 2018, at 10:56 PM, Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
> Jeff,
>
> any idea if this is somehow related to :
> https://issues.apache.org/jira/browse/CASSANDRA-11840?
> does increasing the value of streaming_socket_timeout_in_ms to a higher
> value helps?
>
> On Wed, Aug 29, 2018 at 10:52 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> I have 72 nodes in the cluster, across 8 datacenters.. the moment I try
>> to increase the node above 84 or so, the issue starts.
>>
>> I am still using CMS Heap, assuming it will create more harm if I
>> increase the heap size beyond 8G(recommended).
>>
>> On Wed, Aug 29, 2018 at 6:53 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> Given the size of your schema, you’re probably getting flooded with a
>>> bunch of huge schema mutations as it hops into gossip and tries to pull the
>>> schema from every host it sees. You say 8 DCs but you don’t say how many
>>> nodes - I’m guessing it’s  a lot?
>>>
>>> This is something that’s incrementally better in 3.0, but a real proper
>>> fix has been talked about a few times  -
>>> https://issues.apache.org/jira/browse/CASSANDRA-11748 and
>>> https://issues.apache.org/jira/browse/CASSANDRA-13569 for example
>>>
>>> In the short term, you may be able to work around this by increasing
>>> your heap size. If that doesn’t work, there’s an ugly ugly hack that’ll
>>> work on 2.1:  limiting the number of schema blobs you can get at a time -
>>> in this case, that means firewall off all but a few nodes in your cluster
>>> for 10-30 seconds, make sure it gets the schema (watch the logs or file
>>> system for the tables to be created), then remove the firewall so it can
>>> start the bootstrap process (it needs the schema to setup the streaming
>>> plan, and it needs all the hosts up in gossip to stream successfully, so
>>> this is an ugly hack to give you time to get the schema and then heal the
>>> cluster so it can bootstrap).
>>>
>>> Yea that’s awful. Hopefully either of the two above JIRAs lands to make
>>> this less awful.
>>>
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Aug 29, 2018, at 6:29 PM, Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
>>> It fails before bootstrap
>>>
>>> streaming throughpu on the nodes is set to 400Mb/ps
>>>
>>> On Wednesday, August 29, 2018, Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>> Is the bootstrap plan succeeding (does streaming start or does it crash
>>>> before it logs messages about streaming starting)?
>>>>
>>>> Have you capped the stream throughput on the existing hosts?
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> On Aug 29, 2018, at 5:02 PM, Jai Bheemsen Rao Dhanwada <
>>>> jaibheem...@gmail.com> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> We are seeing some issue when we add more nodes to the cluster, where
>>>> new node bootstrap is not able to stream the entire metadata and fails to
>>>> bootstrap. Finally the process dies with OOM (java.lang.OutOfMemoryError:
>>>> Java heap space)
>>>>
>>>> But if I remove few nodes from the cluster we don't see this issue.
>>>>
>>>> Cassandra Version: 2.1.16
>>>> # of KS and CF : 100, 3000 (approx)
>>>> # of DC: 8
>>>> # of Vnodes per node: 256
>>>>
>>>> Not sure what is causing this behavior, has any one come across this
>>>> scenario?
>>>> thanks in advance.
>>>>
>>>>

Reply via email to