I will take a look and see what I can figure out. Thanks for your help.
Jacques On Thu, Aug 4, 2011 at 9:52 AM, Ryan Rawson <ryano...@gmail.com> wrote: > The regionserver logs that talk about the hlog replay might shed some > light, it should tell you what entries were skipped, etc. Having a > look at the hfile structure of the regions, see if there are holes, > the HFile.main tool can come in handy here, you can run it as: > hbase org.apache.hadoop.hbase.io.hfile.HFile > > it will give you usage. > > Mapr might be able to give you audit logs of the time in question, > that could be useful as well. > > > > On Thu, Aug 4, 2011 at 9:40 AM, Jacques <whs...@gmail.com> wrote: > > Do you have any suggestions of things I should look at to confirm/deny > these > > possibilities? > > > > The tables are very small and inactive (probably only 50-100 rows > changing > > per day). > > > > Thanks, > > Jacques > > > > On Thu, Aug 4, 2011 at 9:09 AM, Ryan Rawson <ryano...@gmail.com> wrote: > > > >> Another possibility is the logs were not replayed correctly during the > >> region startup. We put in a lot of tests to cover this case, so it > >> should not be so. > >> > >> Essentially the WAL replay looks at the current HFiles state, then > >> decides which log entries to replay or skip. This is because a log > >> might have more data than what is strictly missing from the HFiles. > >> > >> If the data that is missing is over 6 hours old, that is a very weird > >> bug, it suggests to me that either an hfile is missing for some > >> reason, or the WAL replay didnt include some for some reason. > >> > >> -ryan > >> > >> On Thu, Aug 4, 2011 at 8:38 AM, Jacques <whs...@gmail.com> wrote: > >> > Thanks for the feedback. So you're inclined to think it would be at > the > >> dfs > >> > layer? > >> > > >> > Is it accurate to say the most likely places where the data could have > >> been > >> > lost were: > >> > 1. wal writes didn't actually get written to disk (no log entries to > >> suggest > >> > any issues) > >> > 2. wal corrupted (no log entries suggest any trouble reading the log) > >> > 3. not all split logs were read by regionservers (?? is there any way > to > >> > ensure this either way... should I look at the filesystem some place?) > >> > > >> > Do you think the type of network partition I'm talking about is > >> adequately > >> > covered in existing tests? (Specifically running an external zk > cluster?) > >> > > >> > Have you heard if anyone else is been having problems with the second > >> 90.4 > >> > rc? > >> > > >> > Thanks again for your help. I'm following up with the MapR guys as > well. > >> > > >> > Jacques > >> > > >> > On Wed, Aug 3, 2011 at 3:49 PM, Jean-Daniel Cryans < > jdcry...@apache.org > >> >wrote: > >> > > >> >> Hi Jacques, > >> >> > >> >> Sorry to hear about that. > >> >> > >> >> Regarding MapR, I personally don't have hands-on experience so it's a > >> >> little bit hard for me to help you. You might want to ping them and > >> >> ask their opinion (and I know they are watching, Ted? Srivas?) > >> >> > >> >> What I can do is telling you if things look normal from the HBase > >> >> point of view, but I see you're not running with DEBUG so I might > miss > >> >> some information. > >> >> > >> >> Looking at the master log, it tells us that it was able to split the > >> >> logs correctly. > >> >> > >> >> Looking at a few regionserver logs, it doesn't seem to say that it > had > >> >> issues replaying the logs so that's good too. > >> >> > >> >> About the memstore questions, it's almost purely size-based (64MB). I > >> >> say almost because we limit the number of WALs a regionserver can > >> >> carry so that when it reaches that limit it force flushes the > >> >> memstores with older edits. There's also a thread that rolls the > >> >> latest log if it's more than an hour old, so in the extreme case it > >> >> could take 32 hours for an edit in the memstore to make it to a > >> >> StoreFile. It used to be that without appends rolling those files > >> >> often would prevent losses older than 1 hour, but I haven't seen > those > >> >> issues since we started using appends. But you're not using HDFS, and > >> >> I don't have MapR experience, so I can't really go any further... > >> >> > >> >> J-D > >> >> > >> >> On Tue, Aug 2, 2011 at 3:44 PM, Jacques <whs...@gmail.com> wrote: > >> >> > Given the hardy reviews and timing, we recently shifted from 90.3 > >> >> (apache) > >> >> > to 90.4rc2 (the July 24th one that Stack posted -- 0.90.4, > r1150278). > >> >> > > >> >> > We had a network switch go down last night which caused an apparent > >> >> network > >> >> > partition between two of our region servers and one or more zk > nodes. > >> >> > (We're still piecing together the situation). Anyway, things > >> *seemed* > >> >> to > >> >> > recover fine. However, this morning we realized that we lost some > >> data > >> >> that > >> >> > was generated just before the problems occurred. > >> >> > > >> >> > It looks like h002 went down nearly immediately at around 8pm while > >> h001 > >> >> > didn't go down until around 8:10pm (somewhat confused by this). > We're > >> >> > thinking that this may have contributed to the problem. The > >> particular > >> >> > table that had data issues is a very small table with a single > region > >> >> that > >> >> > was running on h002 when it went down. > >> >> > > >> >> > We know the corruption/lack of edits affected two tables. It > extended > >> >> > across a number of rows and actually appears to reach back up to > data > >> >> > inserted 6 hours earlier (estimate). The two tables we can verify > >> errors > >> >> on > >> >> > are each probably at most 10-20k <1k rows. Some places rows that > were > >> >> added > >> >> > are completely missing and some just had missing cell edits. As an > >> >> aside, I > >> >> > was thinking there was a time based memstore flush in addition to a > >> size > >> >> > one. But upon reviewing the hbase default configuration, I don't > see > >> >> > mention of it. Is this purely size based? > >> >> > > >> >> > We don't have the tools in place to verify exactly what other data > or > >> >> tables > >> >> > may have been impacted. > >> >> > > >> >> > The log files are at the paste bin links below. The whole cluster > is > >> 8 > >> >> > nodes + master, 3 zk nodes running on separate machines. We run > with > >> >> mostly > >> >> > standard settings but do have the following settings: > >> >> > heap: 12gb > >> >> > regionsize 4gb, (due to lots of cold data and not enough servers, > avg > >> 300 > >> >> > regions/server) > >> >> > mslab: 4m/512k (due to somewhat frequent updates to larger objects > in > >> the > >> >> > 200-500k size range) > >> >> > > >> >> > We've been using hbase for about a year now and have been nothing > but > >> >> happy > >> >> > with it. The failure state that we had last night (where only some > >> >> region > >> >> > servers cannot talk to some zk servers) seems like a strange one. > >> >> > > >> >> > Any thoughts? (beyond chiding for switching to a rc) Any > opinions > >> >> whether > >> >> > we should we roll back to 90.3 (or 90.3+cloudera)? > >> >> > > >> >> > Thanks for any help, > >> >> > Jacques > >> >> > > >> >> > master: http://pastebin.com/aG8fm2KZ > >> >> > h001: http://pastebin.com/nLLk06EC > >> >> > h002: http://pastebin.com/0wPFuZDx > >> >> > h003: http://pastebin.com/3ZMV01mA > >> >> > h004: http://pastebin.com/0YVefuqS > >> >> > h005: http://pastebin.com/N90LDjvs > >> >> > h006: http://pastebin.com/gM8umekW > >> >> > h007: http://pastebin.com/0TVvX68d > >> >> > h008: http://pastebin.com/mV968Cem > >> >> > > >> >> > >> > > >> > > >