Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread M. C. Srivas
On Thu, Aug 4, 2011 at 9:01 PM, Todd Lipcon t...@cloudera.com wrote: On Thu, Aug 4, 2011 at 8:36 PM, lohit lohit.vijayar...@gmail.com wrote: 2011/8/4 Ryan Rawson ryano...@gmail.com Yes, that is what JD is referring to, the so-called IO fence. It works like so: - regionserver is

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread M. C. Srivas
The normal behavior would be for the HMaster to make the hlog read-only before processing it very simple fencing and works on all Posix or close-to-Posix systems. Does that not work on HDFS? On Fri, Aug 5, 2011 at 7:07 AM, M. C. Srivas mcsri...@gmail.com wrote: On Thu, Aug 4, 2011 at

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread Jean-Daniel Cryans
On Fri, Aug 5, 2011 at 8:52 AM, M. C. Srivas mcsri...@gmail.com wrote: The normal behavior would be for the HMaster to make the hlog read-only before processing it very simple fencing and works on all Posix or close-to-Posix systems.  Does that not work on HDFS? I'm sure you know the

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread Ryan Rawson
The IO fencing was an accidental byproduct of how HDFS-200 was implemented, so in fact, HBase won't run correctly on HDFS-265 which does NOT have that IO fencing, right? On Fri, Aug 5, 2011 at 9:42 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Fri, Aug 5, 2011 at 8:52 AM, M. C. Srivas

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread Jean-Daniel Cryans
HDFS-1520 was forward ported to trunk by Stack: https://issues.apache.org/jira/browse/HDFS-1948 J-D On Fri, Aug 5, 2011 at 9:45 AM, Ryan Rawson ryano...@gmail.com wrote: The IO fencing was an accidental byproduct of how HDFS-200 was implemented, so in fact, HBase won't run correctly on

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread M. C. Srivas
On Fri, Aug 5, 2011 at 9:42 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Fri, Aug 5, 2011 at 8:52 AM, M. C. Srivas mcsri...@gmail.com wrote: The normal behavior would be for the HMaster to make the hlog read-only before processing it very simple fencing and works on all Posix or

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread Todd Lipcon
On Fri, Aug 5, 2011 at 10:21 AM, M. C. Srivas mcsri...@gmail.com wrote: On Fri, Aug 5, 2011 at 9:42 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Fri, Aug 5, 2011 at 8:52 AM, M. C. Srivas mcsri...@gmail.com wrote: The normal behavior would be for the HMaster to make the hlog read-only

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread M. C. Srivas
On Fri, Aug 5, 2011 at 11:28 AM, Todd Lipcon t...@cloudera.com wrote: On Fri, Aug 5, 2011 at 10:21 AM, M. C. Srivas mcsri...@gmail.com wrote: On Fri, Aug 5, 2011 at 9:42 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Fri, Aug 5, 2011 at 8:52 AM, M. C. Srivas mcsri...@gmail.com

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Jacques
Thanks for the feedback. So you're inclined to think it would be at the dfs layer? Is it accurate to say the most likely places where the data could have been lost were: 1. wal writes didn't actually get written to disk (no log entries to suggest any issues) 2. wal corrupted (no log entries

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Ryan Rawson
Another possibility is the logs were not replayed correctly during the region startup. We put in a lot of tests to cover this case, so it should not be so. Essentially the WAL replay looks at the current HFiles state, then decides which log entries to replay or skip. This is because a log might

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Jacques
Do you have any suggestions of things I should look at to confirm/deny these possibilities? The tables are very small and inactive (probably only 50-100 rows changing per day). Thanks, Jacques On Thu, Aug 4, 2011 at 9:09 AM, Ryan Rawson ryano...@gmail.com wrote: Another possibility is the

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Ryan Rawson
The regionserver logs that talk about the hlog replay might shed some light, it should tell you what entries were skipped, etc. Having a look at the hfile structure of the regions, see if there are holes, the HFile.main tool can come in handy here, you can run it as: hbase

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Jean-Daniel Cryans
Thanks for the feedback.  So you're inclined to think it would be at the dfs layer? That's where the evidence seems to point. Is it accurate to say the most likely places where the data could have been lost were: 1. wal writes didn't actually get written to disk (no log entries to suggest

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Jacques
I will take a look and see what I can figure out. Thanks for your help. Jacques On Thu, Aug 4, 2011 at 9:52 AM, Ryan Rawson ryano...@gmail.com wrote: The regionserver logs that talk about the hlog replay might shed some light, it should tell you what entries were skipped, etc. Having a

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread M. C. Srivas
On Thu, Aug 4, 2011 at 10:34 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Thanks for the feedback. So you're inclined to think it would be at the dfs layer? That's where the evidence seems to point. Is it accurate to say the most likely places where the data could have been

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Ryan Rawson
Yes, that is what JD is referring to, the so-called IO fence. It works like so: - regionserver is appending to an HLog, continues to do so, hasnt gotten the ZK kill yourself signal yet - hmaster splits the logs - the hmaster yanks the writer from under the regionserver, and the RS then starts to

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread lohit
2011/8/4 Ryan Rawson ryano...@gmail.com Yes, that is what JD is referring to, the so-called IO fence. It works like so: - regionserver is appending to an HLog, continues to do so, hasnt gotten the ZK kill yourself signal yet - hmaster splits the logs - the hmaster yanks the writer from

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Todd Lipcon
On Thu, Aug 4, 2011 at 8:36 PM, lohit lohit.vijayar...@gmail.com wrote: 2011/8/4 Ryan Rawson ryano...@gmail.com Yes, that is what JD is referring to, the so-called IO fence. It works like so: - regionserver is appending to an HLog, continues to do so, hasnt gotten the ZK kill yourself

RE: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Ramkrishna S Vasudevan
on 90.4 rc2 after partial zookeeper network partition (on MapR) 2011/8/4 Ryan Rawson ryano...@gmail.com Yes, that is what JD is referring to, the so-called IO fence. It works like so: - regionserver is appending to an HLog, continues to do so, hasnt gotten the ZK kill yourself signal yet

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-03 Thread Jean-Daniel Cryans
Hi Jacques, Sorry to hear about that. Regarding MapR, I personally don't have hands-on experience so it's a little bit hard for me to help you. You might want to ping them and ask their opinion (and I know they are watching, Ted? Srivas?) What I can do is telling you if things look normal from

Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-02 Thread Jacques
Given the hardy reviews and timing, we recently shifted from 90.3 (apache) to 90.4rc2 (the July 24th one that Stack posted -- 0.90.4, r1150278). We had a network switch go down last night which caused an apparent network partition between two of our region servers and one or more zk nodes.