Re: Problems recovering a dead node

2011-05-04 Thread aaron morton
Certainly sounds a bit sick. The first error looks like it happens when the index file points to the wrong place in the data file for the SSTable. The second one happens when the index file is corrupted. The should be problems nodetool scrub can fix. The disk space may be dead space to

Re: Problems recovering a dead node

2011-05-04 Thread Héctor Izquierdo Seliva
I'm sorry but I can't provide more detailed info as I have restarted the node. After that the number of pending tasks started at 40, and rapidly went down as compactions finished. After that, the ring looks ok, with all the nodes having about the same amount of data. There were no errors in the

Re: Problems recovering a dead node

2011-05-04 Thread Héctor Izquierdo Seliva
El mié, 04-05-2011 a las 21:02 +1200, aaron morton escribió: Certainly sounds a bit sick. The first error looks like it happens when the index file points to the wrong place in the data file for the SSTable. The second one happens when the index file is corrupted. The should be

Problems recovering a dead node

2011-05-03 Thread Héctor Izquierdo Seliva
Hi everyone. One of the nodes in my 6 node cluster died with disk failures. I have replaced the disks, and it's clean. It has the same configuration (same ip, same token). When I try to restart the node it starts to throw mmap underflow exceptions till it closes again. I tried setting io to

Re: Problems recovering a dead node

2011-05-03 Thread aaron morton
When you say it's clean does that mean the node has no data files ? After you replaced the disk what process did you use to recover ? Also what version are you running and what's the recent upgrade history ? Cheers Aaron On 3 May 2011, at 23:09, Héctor Izquierdo Seliva wrote: Hi everyone.

Re: Problems recovering a dead node

2011-05-03 Thread Héctor Izquierdo Seliva
Hi Aaron It has no data files whatsoever. The upgrade path is 0.7.4 - 0.7.5. It turns out the initial problem was the sw raid failing silently because of another faulty disk. Now that the storage is working, I brought up the node again, same IP, same token and tried doing nodetool repair. All