The normal behavior would be for the HMaster to make the hlog read-only before processing it.... very simple fencing and works on all Posix or close-to-Posix systems. Does that not work on HDFS?
On Fri, Aug 5, 2011 at 7:07 AM, M. C. Srivas <mcsri...@gmail.com> wrote: > > > On Thu, Aug 4, 2011 at 9:01 PM, Todd Lipcon <t...@cloudera.com> wrote: > >> On Thu, Aug 4, 2011 at 8:36 PM, lohit <lohit.vijayar...@gmail.com> wrote: >> > 2011/8/4 Ryan Rawson <ryano...@gmail.com> >> > >> >> Yes, that is what JD is referring to, the so-called IO fence. >> >> >> >> It works like so: >> >> - regionserver is appending to an HLog, continues to do so, hasnt >> >> gotten the ZK "kill yourself signal" yet >> >> - hmaster splits the logs >> >> - the hmaster yanks the writer from under the regionserver, and the RS >> >> then starts to kill itself >> >> >> > Can you tell more about how this is done with HDFS. If RS has the lease, >> how >> > did master get hold of that lease. Or is it removing file? >> >> In older versions, it would call append() which recovered the lease, >> so long as the soft lease timeout had expired. More recently, it calls >> an HDFS "recoverLease" API that provides fencing. >> > > Looks like we need a patch in both HBase and MapR ... even if MapR had > leases, this piece of code in FSUtils.java prevents it being called. > > if (!(fs instanceof DistributedFileSystem)) { > return; > } > > Someone will be issuing a patch for both MapR and HBase to fix this in a > couple of days. (I am on vacation). > > > > >> >> >> >> >> >> >> This can happen because ZK can deliver the session lost message late, >> >> and there is a race. >> >> >> >> -ryan >> >> >> >> On Thu, Aug 4, 2011 at 8:13 PM, M. C. Srivas <mcsri...@gmail.com> >> wrote: >> >> > On Thu, Aug 4, 2011 at 10:34 AM, Jean-Daniel Cryans < >> jdcry...@apache.org >> >> >wrote: >> >> > >> >> >> > Thanks for the feedback. So you're inclined to think it would be >> at >> >> the >> >> >> dfs >> >> >> > layer? >> >> >> >> >> >> That's where the evidence seems to point. >> >> >> >> >> >> > >> >> >> > Is it accurate to say the most likely places where the data could >> have >> >> >> been >> >> >> > lost were: >> >> >> > 1. wal writes didn't actually get written to disk (no log entries >> to >> >> >> suggest >> >> >> > any issues) >> >> >> >> >> >> Most likely. >> >> >> >> >> >> > 2. wal corrupted (no log entries suggest any trouble reading the >> log) >> >> >> >> >> >> In that case the logs would scream (and I didn't see that in the >> logs >> >> >> I looked at). >> >> >> >> >> >> > 3. not all split logs were read by regionservers (?? is there any >> way >> >> to >> >> >> > ensure this either way... should I look at the filesystem some >> place?) >> >> >> >> >> >> Some regions would have recovered edits files, but that seems highly >> >> >> unlikely. With DEBUG enabled we could have seen which files were >> split >> >> >> by the master and which ones were created for the regions, and then >> >> >> which were read by the region servers. >> >> >> >> >> >> > >> >> >> > Do you think the type of network partition I'm talking about is >> >> >> adequately >> >> >> > covered in existing tests? (Specifically running an external zk >> >> cluster?) >> >> >> >> >> >> The IO fencing was only tested with HDFS, I don't know what happens >> in >> >> >> that case with MapR. What I mean is that when the master splits the >> >> >> logs, it takes ownership of the HDFS writer lease (only one per >> file) >> >> >> so that it can safely close the log file. Then after that it checks >> if >> >> >> there are any new log files that were created (the region server >> could >> >> >> have rolled a log while the master was splitting them) and will >> >> >> restart if that situation happens until it's able to own all files >> and >> >> >> split them. >> >> >> >> >> > >> >> > JD, I didn't think the master explicitly dealt with writer leases. >> >> > >> >> > Does HBase rely on single-writer semantics on the log file? That is, >> if >> >> the >> >> > master and a RS both decide to mucky-muck with a log file, you expect >> the >> >> FS >> >> > to lock out one of the writers? >> >> > >> >> > >> >> > >> >> > >> >> >> >> >> >> > >> >> >> > Have you heard if anyone else is been having problems with the >> second >> >> >> 90.4 >> >> >> > rc? >> >> >> >> >> >> Nope, we run it here on our dev cluster and didn't encounter any >> issue >> >> >> (with the code or node failure). >> >> >> >> >> >> > >> >> >> > Thanks again for your help. I'm following up with the MapR guys >> as >> >> well. >> >> >> >> >> >> Good idea! >> >> >> >> >> >> J-D >> >> >> >> >> > >> >> >> > >> > >> > >> > -- >> > Have a Nice Day! >> > Lohit >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > >