Do you have any suggestions of things I should look at to confirm/deny these possibilities?
The tables are very small and inactive (probably only 50-100 rows changing per day). Thanks, Jacques On Thu, Aug 4, 2011 at 9:09 AM, Ryan Rawson <ryano...@gmail.com> wrote: > Another possibility is the logs were not replayed correctly during the > region startup. We put in a lot of tests to cover this case, so it > should not be so. > > Essentially the WAL replay looks at the current HFiles state, then > decides which log entries to replay or skip. This is because a log > might have more data than what is strictly missing from the HFiles. > > If the data that is missing is over 6 hours old, that is a very weird > bug, it suggests to me that either an hfile is missing for some > reason, or the WAL replay didnt include some for some reason. > > -ryan > > On Thu, Aug 4, 2011 at 8:38 AM, Jacques <whs...@gmail.com> wrote: > > Thanks for the feedback. So you're inclined to think it would be at the > dfs > > layer? > > > > Is it accurate to say the most likely places where the data could have > been > > lost were: > > 1. wal writes didn't actually get written to disk (no log entries to > suggest > > any issues) > > 2. wal corrupted (no log entries suggest any trouble reading the log) > > 3. not all split logs were read by regionservers (?? is there any way to > > ensure this either way... should I look at the filesystem some place?) > > > > Do you think the type of network partition I'm talking about is > adequately > > covered in existing tests? (Specifically running an external zk cluster?) > > > > Have you heard if anyone else is been having problems with the second > 90.4 > > rc? > > > > Thanks again for your help. I'm following up with the MapR guys as well. > > > > Jacques > > > > On Wed, Aug 3, 2011 at 3:49 PM, Jean-Daniel Cryans <jdcry...@apache.org > >wrote: > > > >> Hi Jacques, > >> > >> Sorry to hear about that. > >> > >> Regarding MapR, I personally don't have hands-on experience so it's a > >> little bit hard for me to help you. You might want to ping them and > >> ask their opinion (and I know they are watching, Ted? Srivas?) > >> > >> What I can do is telling you if things look normal from the HBase > >> point of view, but I see you're not running with DEBUG so I might miss > >> some information. > >> > >> Looking at the master log, it tells us that it was able to split the > >> logs correctly. > >> > >> Looking at a few regionserver logs, it doesn't seem to say that it had > >> issues replaying the logs so that's good too. > >> > >> About the memstore questions, it's almost purely size-based (64MB). I > >> say almost because we limit the number of WALs a regionserver can > >> carry so that when it reaches that limit it force flushes the > >> memstores with older edits. There's also a thread that rolls the > >> latest log if it's more than an hour old, so in the extreme case it > >> could take 32 hours for an edit in the memstore to make it to a > >> StoreFile. It used to be that without appends rolling those files > >> often would prevent losses older than 1 hour, but I haven't seen those > >> issues since we started using appends. But you're not using HDFS, and > >> I don't have MapR experience, so I can't really go any further... > >> > >> J-D > >> > >> On Tue, Aug 2, 2011 at 3:44 PM, Jacques <whs...@gmail.com> wrote: > >> > Given the hardy reviews and timing, we recently shifted from 90.3 > >> (apache) > >> > to 90.4rc2 (the July 24th one that Stack posted -- 0.90.4, r1150278). > >> > > >> > We had a network switch go down last night which caused an apparent > >> network > >> > partition between two of our region servers and one or more zk nodes. > >> > (We're still piecing together the situation). Anyway, things > *seemed* > >> to > >> > recover fine. However, this morning we realized that we lost some > data > >> that > >> > was generated just before the problems occurred. > >> > > >> > It looks like h002 went down nearly immediately at around 8pm while > h001 > >> > didn't go down until around 8:10pm (somewhat confused by this). We're > >> > thinking that this may have contributed to the problem. The > particular > >> > table that had data issues is a very small table with a single region > >> that > >> > was running on h002 when it went down. > >> > > >> > We know the corruption/lack of edits affected two tables. It extended > >> > across a number of rows and actually appears to reach back up to data > >> > inserted 6 hours earlier (estimate). The two tables we can verify > errors > >> on > >> > are each probably at most 10-20k <1k rows. Some places rows that were > >> added > >> > are completely missing and some just had missing cell edits. As an > >> aside, I > >> > was thinking there was a time based memstore flush in addition to a > size > >> > one. But upon reviewing the hbase default configuration, I don't see > >> > mention of it. Is this purely size based? > >> > > >> > We don't have the tools in place to verify exactly what other data or > >> tables > >> > may have been impacted. > >> > > >> > The log files are at the paste bin links below. The whole cluster is > 8 > >> > nodes + master, 3 zk nodes running on separate machines. We run with > >> mostly > >> > standard settings but do have the following settings: > >> > heap: 12gb > >> > regionsize 4gb, (due to lots of cold data and not enough servers, avg > 300 > >> > regions/server) > >> > mslab: 4m/512k (due to somewhat frequent updates to larger objects in > the > >> > 200-500k size range) > >> > > >> > We've been using hbase for about a year now and have been nothing but > >> happy > >> > with it. The failure state that we had last night (where only some > >> region > >> > servers cannot talk to some zk servers) seems like a strange one. > >> > > >> > Any thoughts? (beyond chiding for switching to a rc) Any opinions > >> whether > >> > we should we roll back to 90.3 (or 90.3+cloudera)? > >> > > >> > Thanks for any help, > >> > Jacques > >> > > >> > master: http://pastebin.com/aG8fm2KZ > >> > h001: http://pastebin.com/nLLk06EC > >> > h002: http://pastebin.com/0wPFuZDx > >> > h003: http://pastebin.com/3ZMV01mA > >> > h004: http://pastebin.com/0YVefuqS > >> > h005: http://pastebin.com/N90LDjvs > >> > h006: http://pastebin.com/gM8umekW > >> > h007: http://pastebin.com/0TVvX68d > >> > h008: http://pastebin.com/mV968Cem > >> > > >> > > >