Re: Repair Hangs while requesting Merkle Trees

Anuj Wadehra Sun, 29 Nov 2015 09:15:06 -0800

Yes. I think you are correct, problem might have resolved via Cassandra restart 
rather than increasing request timeout.

We are NOT on EC2. We have 2 interfaces on each node: one private and one
public.
We have strange configuration and we need to correct it as per the
recommendation at
https://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configMultiNetworks.html
.

AS-IS config:
We use broadcast address=listen address=PUBLIC IP address.
In seeds, we put PUBLIC IP of other nodes but private IP for the local node.
There were some issues if we tried to access local node via its public IP.

Thanks
Anuj

--------------------------------------------
On Tue, 24/11/15, Paulo Motta <pauloricard...@gmail.com> wrote:

Subject: Re: Repair Hangs while requesting Merkle Trees
To: "user@cassandra.apache.org" <user@cassandra.apache.org>, "Anuj Wadehra"
<anujw_2...@yahoo.co.in>
Date: Tuesday, 24 November, 2015, 12:38 AM

The issue might be related to the
ESTABLISHED connections just in one end. I don't think
it might be related to inter_dc_tcp_nodelay or
request_timeout_in_ms options. Did you restart the process
when you changed the request_timeout_in_ms option? This
might be why the problem got fixed and not the option
change.

This seem
like a network issue or a misconfiguration of this specific
node. Are you using EC2? Is listen_address ==
broadcast_address? Are all nodes using the same
configuration? What java are you using?

You may want to enable TRACE on
OutgoingTcpConnection and IncomingTcpConnection and compare
the outputs of healthy nodes with the faulty node.

2015-11-23 10:04 GMT-08:00
Anuj Wadehra <anujw_2...@yahoo.co.in>:
Any
comments on ESTABLISHED connections at one end?

Moreover, inter_dc_tcp_nodelay is false. Can this be the
reason that latency between two DC is more and repair
messages are getting dropped?

Can increasing request_timeout_in_ms deal with the latency
issue..

I see some hinted handoffs being triggered for cross DC
nodes..and hints replay being timed-out..Is that an
indication of a network issue?

I am getting in tough with network team to capture netstats
and tcpdump too..

Thanks

Anuj

--------------------------------------------

On Wed, 18/11/15, Anuj Wadehra
<anujw_2...@yahoo.co.in>
wrote:

Subject: Re: Repair Hangs while requesting Merkle Trees

To: "user@cassandra.apache.org"
<user@cassandra.apache.org>

Date: Wednesday, 18 November, 2015, 7:57 AM

Thanks Bryan !!

Connection

is in ESTBLISHED state on on end and completely missing
at

other end (in another dc).

Yes,

we can revisit TCP tuning.But the problem is node
specific.

So not sure whether tuning is the culprit.

ThanksAnuj

Sent

from Yahoo Mail on Android From:"Bryan

Cheng" <br...@blockcypher.com>

Date:Wed, 18 Nov, 2015 at

2:04 am

Subject:Re: Repair Hangs

while requesting Merkle Trees

Ah OK, might

have misunderstood you. Streaming socket should not be
in

play during merkle tree generation (validation
compaction).

They may come in play during merkle tree exchange- that

I'm not sure about. You can read a bit more here:
https://issues.apache.org/jira/browse/CASSANDRA-8611.

Regardless, you should have it set-

1 hr is usually a good conservative estimate, but you can
go

much lower safely.

What state is the connection on that

only shows on one side? Is it ESTABLISHED, or something
like

CLOSE_WAIT?

Here's

a good place to start for tuning, though it doesn't
have

as much about network tuning:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html.

More generally, TCP tuning usually revolves around a
balance

between latency and bandwidth. Over long connections

(we're talking 10s of ms, instead of the sub 1ms
you

usually see in a good dc network), your expectations
will

shift greatly. Stuff like NODELAY on tcp is very nice
for

cutting your latencies when you're inside a DC, but
will

generate lots of small packets that will hurt your
bandwidth

over longer connections due to the need to wait for
acks.

otc_coalescing_strategy is on a similar vein, bundling

together nearby messages to trade latency for
throughput.

You'll also probably want to tune your tcp buffers
and

window sizes, since that determines how much data can
be

in-flight between acknowledgements, and the default size
is

pitiful for any decent network size. Google

around for TCP tuning/buffer tuning and you should
find

some good resources.

On Mon, Nov 16, 2015 at

5:23 PM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

Hi Bryan,

Thanks for the reply !!I

didnt mean streaming_socket_tomeout_in_ms. I meant when
you

run netstats (Linux cmnd) on node A in DC1, you will

notice that there is connection in established state
with

node B in DC2. But when you run netstats on node B, you
wont

find any connection with node A. Such connections are
there

across dc? Is it a problem.

We havent set

streaming_socket_timeout_in_ms which I know must be set.
But

I am not sure wtheher setting this property has any
impact

on merkle tree requests. I thought its valid for data

streaming if some mismatch is

found and data needs to be streamed.Please confirm.
Whats

the value you use for streaming socket

timeout?

Morever, if

socket timeout is the issue, that should happen on
other

nodes too...repair is not running on just one node, as

merkle tree request is getting lost n not transmitted to
one

or more nodes in remote dc.

I am not sure about exact distance.

But they are connected with a very high speed 10gbps

link.

When you say

different TCP stack tuning..do u have any
document/blog/link

describing recommendations for multi Dc Cassandra
setup?

Can you elaborate what all settings

need to be different?

ThanksAnuj

Sent

from Yahoo Mail on Android From:"Bryan

Cheng" <br...@blockcypher.com>

Date:Tue, 17 Nov, 2015 at 5:54

Subject:Re: Repair

Hangs while requesting Merkle Trees

Hi Anuj,

Did you mean

streaming_socket_timeout_in_ms? If not, then you
definitely

want that set. Even the best network connections will
break

occasionally, and in Cassandra < 2.1.10 (I believe)
this

would leave those connections hanging indefinitely on
one

end.

How far away are

your two DC's from a network perspective, out of

curiosity? You'll almost certainly be doing
different

TCP stack tuning for cross-DC, notably your buffer
sizes,

window params, cassandra-specific stuff like

otc_coalescing_strategy, inter_dc_tcp_nodelay,

etc.

On Sat, Nov 14, 2015 at

10:35 AM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

One more observation.We observed

that there are few TCP connections which node shows as

Established but when we go to node at other
end,connection

is not there. They are called "phantom"

connections I guess. Can this be a possible cause?

ThanksAnuj

Sent

from Yahoo Mail on Android From:"Anuj

Wadehra" <anujw_2...@yahoo.co.in>

Date:Sat, 14 Nov, 2015 at 11:59

Subject:Re: Repair Hangs

while

requesting Merkle Trees

Thanks Daemeon

I wil capture the output

of netstats and share in next few days. We were thinking
of

taking tcp dumps also. If its a network issue and
increasing

request timeout worked, not sure how Cassandra is
dropping

messages based on timeout.Repair messages are non
droppable

and not supposed to be timedout.

2 of the 3 nodes in the DC are able

to complete repair without any issue. Just one node is

problematic.

I also observed

frequent messages in logs of other

nodes which say that hints replay timedout..and the
node

where hints were being replayed is always a remote dc

node. Is it related some how?

ThanksAnujSent

from Yahoo Mail on Android From:"daemeon

reiydelle" <daeme...@gmail.com>

Date:Thu, 12 Nov, 2015 at 10:34 am

Subject:Re: Repair Hangs while

requesting Merkle Trees

Have you checked the network

statistics on that machine? (netstats -tas) while
attempting

to repair ... if netstats show ANY issues

you have a problem. If you can put the command in a
loop

running every 60 seconds for maybe 15 minutes and post

back?

Out of curiousity,

how many remote DC nodes are getting successfully

repaired?

.......

“Life should not be a journey to the

grave with the intention of

arriving safely in a

pretty and well

preserved body, but rather to skid

in broadside in a cloud of smoke,

thoroughly used up, totally worn out,

and loudly proclaiming “Wow! What a Ride!”

- Hunter Thompson

Daemeon C.M. Reiydelle

USA (+1)
415.501.0198

London (+44) (0)

20 8144 9872

On Wed, Nov 11, 2015 at

1:06 PM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

Hi,

we are using 2.0.14. We

have 2 DCs at remote locations with 10GBps
connectivity.We

are able to

complete repair (-par -pr) on 5 nodes. On only one node
in

DC2, we are

unable to complete repair as it always hangs. Node
sends

Merkle Tree

requests, but one or more nodes in DC1 (remote) never
show

that they

sent the merkle tree reply to requesting node.

Repair hangs infinitely.

After increasing request_timeout_in_ms on

affected node, we were able to successfully run repair
on

one of the two occassions.

Any

comments, why this is happening on just one node? In

OutboundTcpConnection.java, when isTimeOut method
always

returns false

for non-droppable verb such as Merkle Tree

Request(verb=REPAIR_MESSAGE),why increasing request
timeout

solved

problem on one occasion ?

Thanks

Anuj Wadehra

On Thursday, 12

November 2015 2:35 AM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

Hi,

We have 2 DCs at remote

locations with 10GBps connectivity.We are able to
complete

repair (-par -pr) on 5 nodes. On only one node in DC2,
we

are unable to complete repair as it always hangs. Node
sends

Merkle Tree requests, but one or more nodes in DC1
(remote)

never show that they sent the merkle tree reply to

requesting node.

Repair hangs infinitely.

After increasing

request_timeout_in_ms on affected node, we were able to

successfully run repair on one of the two occassions.

Any comments, why this is

happening on just one node? In
OutboundTcpConnection.java,

when isTimeOut method always returns false for
non-droppable

verb such as Merkle Tree
Request(verb=REPAIR_MESSAGE),why

increasing

request timeout solved problem on one occasion ?

Thanks

Anuj Wadehra

Re: Repair Hangs while requesting Merkle Trees

Reply via email to