On Thu, Aug 4, 2011 at 9:01 PM, Todd Lipcon <t...@cloudera.com> wrote:
> On Thu, Aug 4, 2011 at 8:36 PM, lohit <lohit.vijayar...@gmail.com> wrote: > > 2011/8/4 Ryan Rawson <ryano...@gmail.com> > > > >> Yes, that is what JD is referring to, the so-called IO fence. > >> > >> It works like so: > >> - regionserver is appending to an HLog, continues to do so, hasnt > >> gotten the ZK "kill yourself signal" yet > >> - hmaster splits the logs > >> - the hmaster yanks the writer from under the regionserver, and the RS > >> then starts to kill itself > >> > > Can you tell more about how this is done with HDFS. If RS has the lease, > how > > did master get hold of that lease. Or is it removing file? > > In older versions, it would call append() which recovered the lease, > so long as the soft lease timeout had expired. More recently, it calls > an HDFS "recoverLease" API that provides fencing. > Looks like we need a patch in both HBase and MapR ... even if MapR had leases, this piece of code in FSUtils.java prevents it being called. if (!(fs instanceof DistributedFileSystem)) { return; } Someone will be issuing a patch for both MapR and HBase to fix this in a couple of days. (I am on vacation). > > >> > >> > >> This can happen because ZK can deliver the session lost message late, > >> and there is a race. > >> > >> -ryan > >> > >> On Thu, Aug 4, 2011 at 8:13 PM, M. C. Srivas <mcsri...@gmail.com> > wrote: > >> > On Thu, Aug 4, 2011 at 10:34 AM, Jean-Daniel Cryans < > jdcry...@apache.org > >> >wrote: > >> > > >> >> > Thanks for the feedback. So you're inclined to think it would be > at > >> the > >> >> dfs > >> >> > layer? > >> >> > >> >> That's where the evidence seems to point. > >> >> > >> >> > > >> >> > Is it accurate to say the most likely places where the data could > have > >> >> been > >> >> > lost were: > >> >> > 1. wal writes didn't actually get written to disk (no log entries > to > >> >> suggest > >> >> > any issues) > >> >> > >> >> Most likely. > >> >> > >> >> > 2. wal corrupted (no log entries suggest any trouble reading the > log) > >> >> > >> >> In that case the logs would scream (and I didn't see that in the logs > >> >> I looked at). > >> >> > >> >> > 3. not all split logs were read by regionservers (?? is there any > way > >> to > >> >> > ensure this either way... should I look at the filesystem some > place?) > >> >> > >> >> Some regions would have recovered edits files, but that seems highly > >> >> unlikely. With DEBUG enabled we could have seen which files were > split > >> >> by the master and which ones were created for the regions, and then > >> >> which were read by the region servers. > >> >> > >> >> > > >> >> > Do you think the type of network partition I'm talking about is > >> >> adequately > >> >> > covered in existing tests? (Specifically running an external zk > >> cluster?) > >> >> > >> >> The IO fencing was only tested with HDFS, I don't know what happens > in > >> >> that case with MapR. What I mean is that when the master splits the > >> >> logs, it takes ownership of the HDFS writer lease (only one per file) > >> >> so that it can safely close the log file. Then after that it checks > if > >> >> there are any new log files that were created (the region server > could > >> >> have rolled a log while the master was splitting them) and will > >> >> restart if that situation happens until it's able to own all files > and > >> >> split them. > >> >> > >> > > >> > JD, I didn't think the master explicitly dealt with writer leases. > >> > > >> > Does HBase rely on single-writer semantics on the log file? That is, > if > >> the > >> > master and a RS both decide to mucky-muck with a log file, you expect > the > >> FS > >> > to lock out one of the writers? > >> > > >> > > >> > > >> > > >> >> > >> >> > > >> >> > Have you heard if anyone else is been having problems with the > second > >> >> 90.4 > >> >> > rc? > >> >> > >> >> Nope, we run it here on our dev cluster and didn't encounter any > issue > >> >> (with the code or node failure). > >> >> > >> >> > > >> >> > Thanks again for your help. I'm following up with the MapR guys as > >> well. > >> >> > >> >> Good idea! > >> >> > >> >> J-D > >> >> > >> > > >> > > > > > > > > -- > > Have a Nice Day! > > Lohit > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >