Re: Weird Replication exception

2013-06-02 Thread Asaf Mesika
No, this was brand new with 0 length thus the peculiar message of too old was strange to me. On Monday, June 3, 2013, Himanshu Vashishtha wrote: > Hey Asaf, > > It looks like you only need 7122. Either upgrade, or you could also patch > it up. > > Syncing up the master and slave cluster is also a

Re: Never ending distributed log split

2013-06-02 Thread Stack
On Sun, Jun 2, 2013 at 8:09 AM, Jean-Marc Spaggiari wrote: > So, 2 things again here. > > 1) Should the region server send more information of the failure to > the master the the master can display the failure cause on the logs? > Yes. You shouldn't have to work so hard to figure root cause (sm

Re: Multiple different failures

2013-06-02 Thread ramkrishna vasudevan
>>So regions got re-assigned on by one... Was SO long... Should not HBCK try to re-assign all those regions in parallel or at least as many thread as we have region servers? This point can be looked into. Also need to check the code once as how it works now. Regards Ram On Sun, Jun 2, 2013

Re: Weird Replication exception

2013-06-02 Thread Himanshu Vashishtha
Hey Asaf, It looks like you only need 7122. Either upgrade, or you could also patch it up. Syncing up the master and slave cluster is also advised, but that stands good in case you are using master-master replication. bq. 172.25.98.74,60020, 1369903540894/172.25.98.74 %2C60020%2C1369903540894.1

Re: Weird Replication exception

2013-06-02 Thread Ted Yu
bq. Is 0.94.8 production ready? I think so. Lars released 0.94.8 Friday evening. On Sun, Jun 2, 2013 at 12:26 PM, Asaf Mesika wrote: > I use 0.94.7. > Is 0.94.8 production ready? > > So in summary I have two issues: > 1. Clocks are out of sync > 2. I need to upgrade to 0.94.8 to avoid seeing th

Re: Weird Replication exception

2013-06-02 Thread Asaf Mesika
I use 0.94.7. Is 0.94.8 production ready? So in summary I have two issues: 1. Clocks are out of sync 2. I need to upgrade to 0.94.8 to avoid seeing this WARN messages? On Jun 2, 2013, at 5:46 PM, Ted Yu wrote: > What is the HBase version you're using ? > > In another thread, I mentioned this:

Re: Never ending distributed log split

2013-06-02 Thread Jean-Marc Spaggiari
I'm using 0.94.7 since I did not get the chance to deploye the last RC... I will wait for some more feedback regarding the option (delete or rename) and most probably will open a JIRA. Regardeing recovered.editsI don't have this file anymore, but I just found another one which is blocking some ot

Re: Never ending distributed log split

2013-06-02 Thread Ted Yu
Can you search for 1d44b0630ed7785106a87a2bd4993551/recovered.edits to see when it was created ? Namenode log would be a good place to start with. bq. we can also rename it so if really required we can replay it later? The above is a better way of handling the situation. What version of HBase ar

Never ending distributed log split

2013-06-02 Thread Jean-Marc Spaggiari
My HBase was in a bad state recently. HBCK did a slow but good job and everything is now almost stable. However, I still have one log split which is not working. Every minute, the SplitLogManager try to split the log, fails, and retry. It's always the same file. It's assigned to different nodes, bu

Re: Weird Replication exception

2013-06-02 Thread Ted Yu
What is the HBase version you're using ? In another thread, I mentioned this: There was a recently integrated JIRA (0.94.8): HBASE-7122 Proper warning message when opening a log file with no entries (idle cluster) Does the HBase you're using contain HBASE-7122 ? Cheers On Sat, Jun 1, 2013 at 1

Re: Weird Replication exception

2013-06-02 Thread shashwat shriparv
This due to the time synchronisation between the master and slave, max of 30 seconds is allowed between the master and RS nodes, if you are in need you can increase this time difference. time difference between the master and the RS are very vital. On Sun, Jun 2, 2013 at 10:50 AM, Asaf Mesika w

Re: Multiple different failures

2013-06-02 Thread Jean-Marc Spaggiari
Hi Varun, Data was no more there in HBase because entries were missing in the META. I had only 100 regions in my table, instead of the expected 1000. So it "disappears"... But data was still there in HDFS. It's very hard to really definitively loos data with HBase/Hadoop. So HBCK was able to find

Re: querying hbase

2013-06-02 Thread Andrew Purtell
On Sat, Jun 1, 2013 at 8:15 PM, Michael Segel wrote: > What happens when you restart the RS? > I think 1) the master is given a heads-up, 2) all of the regions are closed, 3) the JVM is bounced and everything is reloaded, 4) the RS comes back up and checks in with the master, 5) the master reassi

Re: querying hbase

2013-06-02 Thread Andrew Purtell
On Sat, Jun 1, 2013 at 10:20 PM, James Taylor wrote: > These approaches all sound somewhat brittle and unlikely to be relied on > for a production system (more here: https://issues.apache.org/** > jira/browse/HBASE-8607 ). > Sounds like a rolling r

Re: querying hbase

2013-06-02 Thread Andrew Purtell
On Sun, Jun 2, 2013 at 4:44 AM, Michael Segel wrote: > Sure, but that wont change the fact that Coprocessors should go under a > massive rewrite. Can you elaborate a bit? I would say we had our reasons for how things are but I don't want to defend the design here, I'd like to hear about alterna