I asked the same question on the IRC but no luck there, everyone's asleep
;)...

Using 0.6.6 I'm adding a new node to the cluster.
It starts out fine but then gets stuck on the bootstrapping state for too
long. More than an hour and still counting.

$ bin/nodetool -p 9004 -h localhost streams
> Mode: Bootstrapping
> Not sending any streams.
> Not receiving any streams.


It seemed to have streamed data from other nodes and indeed the load is
non-zero but I'm not clear what's keeping it right now from finishing.

> $ bin/nodetool -p 9004 -h localhost info
> 51042355038140769519506191114765231716
> Load             : 22.49 GB
> Generation No    : 1294133781
> Uptime (seconds) : 1795
> Heap Memory (MB) : 315.31 / 6117.00


nodetool ring does not list this new node in the ring, although nodetool can
happily talk to the new node, it's just not listing itself as a member of
the ring. This is expected when the node is still bootstrapping, so the
question is still how long might the bootstrap take and whether is it stuck.

The data ins't huge so I find it hard to believe that streaming or anti
compaction are the bottlenecks. I have ~20G on each node and the new node
already has just about that so it seems that all data had already been
streamed to it successfully, or at least most of the data... So what is it
waiting for now? (same question, rephrased... ;)

I tried:
1. Restarting the new node. No good. All logs seem normal but at the end the
node is still in bootstrap mode.
2. As someone suggested I increased the rpc timeout from 10k to 30k
(RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
new node. Should I have done that on all (old) nodes as well? Or maybe only
on the ones that were supposed to stream data to that node.
3. Logging level at DEBUG now but nothing interesting going on except
for occasional messages such as [1] or [2]

So the question is: what's keeping the new node from finishing the bootstrap
and how can I check its status?
Thanks

[1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36)
Disseminating load info ...
[2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
StorageService.java (line 1189) computing ranges for
28356863910078205288614550619314017621,
56713727820156410577229101238628035242,
 85070591730234615865843651857942052863,
113427455640312821154458202477256070484,
141784319550391026443072753096570088105,
170141183460469231731687303715884105727

-- 
/Ran

Reply via email to