I'm there. Thanks St.Ack. On Wed, Sep 22, 2010 at 11:59 PM, Stack <st...@duboce.net> wrote:
> Hey George: > > James Kennedy is working on getting transactional hbase working w/ > hbase TRUNK. Watch HBASE-2641 for the drop of changes needed in core > to make it so his github THBase can use HBase core. > > St.Ack > > On Mon, Sep 20, 2010 at 5:43 PM, Ryan Rawson <ryano...@gmail.com> wrote: > > hi, > > > > sorry i dont. i think the current transactional/indexed person is > > working on bringing it up to 0.89, perhaps they would enjoy your help > > in testing or porting the code? > > > > I'll poke a few people into replying. > > > > -ryan > > > > On Mon, Sep 20, 2010 at 5:19 PM, George P. Stathis <gstat...@traackr.com> > wrote: > >> On Mon, Sep 20, 2010 at 4:55 PM, Ryan Rawson <ryano...@gmail.com> > wrote: > >> > >>> When you say replication what exactly do you mean? In normal HDFS, as > >>> you write the data is sent to 3 nodes yes, but with the flaw I > >>> outlined, it doesnt matter because the datanodes and namenode will > >>> pretend a data block just didnt exist if it wasnt closed properly. > >>> > >> > >> That's the part I was not understanding. I do now. Thanks. > >> > >> > >>> > >>> So even with the most careful white glove handling of hbase, you will > >>> eventually have a crash and you will lose data w/o 0.89/CDH3 et. al. > >>> You can circumvent this by storing the data elsewhere and spooling > >>> into hbase, or perhaps just not minding if you lose data (yes those > >>> applications exist). > >>> > >>> Looking at those JIRAs in question, the first is already on trunk > >>> which is 0.89. The second isn't alas. At this point the > >>> transactional hbase just isnt being actively maintained by any > >>> committer and we are reliant on kind people's contributions. So I > >>> can't promise when it will hit 0.89/0.90. > >>> > >> > >> Are you aware of any indexing alternatives in 0.89? > >> > >> > >>> > >>> -ryan > >>> > >>> > >>> On Mon, Sep 20, 2010 at 1:21 PM, George P. Stathis < > gstat...@traackr.com> > >>> wrote: > >>> > Thanks for the response Ryan. I have no doubt that 0.89 can be used > in > >>> > production and that it has strong support. I just wanted to avoid > moving > >>> to > >>> > it now because we have limited resources and it would put a dent in > our > >>> > roadmap if we were to fast track the migration now. Specifically, we > are > >>> > using HBASE-2438 and HBASE-2426 to support pagination across indexes. > So > >>> we > >>> > either have to migrate those to 0.89 or somehow go stock and be able > to > >>> > support pagination across region servers. > >>> > > >>> > Of course, if the choice is between migrating or losing more data, > data > >>> > safety comes first. But if we can buy two or three more months of > time > >>> and > >>> > avoid region server crashes (like you did for a year), maybe we can > go > >>> that > >>> > route for now. What do we need to do achieve that? > >>> > > >>> > -GS > >>> > > >>> > PS: Out of curiosity, I understand the WAL log append issue for a > single > >>> > regionserver when it comes to losing the data on a single node. But > if > >>> that > >>> > data is also being replicated on another region server, why wouldn't > it > >>> be > >>> > available there? Or is the WAL log shared across multiple region > servers > >>> > (maybe that's what I'm missing)? > >>> > > >>> > > >>> > On Mon, Sep 20, 2010 at 3:52 PM, Ryan Rawson <ryano...@gmail.com> > wrote: > >>> > > >>> >> Hey, > >>> >> > >>> >> The problem is that the stock 0.20 hadoop wont let you read from a > >>> >> non-closed file. It will report that length as 0. So if a > >>> >> regionserver crashes, that last WAL log that is still open becomes 0 > >>> >> length and the data within in unreadable. That specifically is the > >>> >> problem of data loss. You could always make it so your > regionservers > >>> >> rarely crash - this is possible btw and I did it for over a year. > >>> >> > >>> >> But you will want to run CDH3 or the append-branch releases to get > the > >>> >> series of patches that fix this hole. It also happens that only > 0.89 > >>> >> runs on it. I would like to avoid the hadoop "everyone uses 0.20 > >>> >> forever" problem and talk about what we could do to help you get on > >>> >> 0.89. Over here at SU we've made a commitment to the future of 0.89 > >>> >> and are running it in production. Let us know what else you'd need. > >>> >> > >>> >> -ryan > >>> >> > >>> >> On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis > >>> >> <gstat...@traackr.com> wrote: > >>> >> > Thanks Todd. We are not quite ready to move to 0.89 yet. We have > made > >>> >> custom > >>> >> > modifications to the transactional contrib sources which are now > taken > >>> >> out > >>> >> > of 0.89. We are planning on moving to 0.90 when it comes out and > at > >>> that > >>> >> > point, either migrate our customizations, or move back to the > >>> out-of-the > >>> >> box > >>> >> > features (which will require a re-write of our code). > >>> >> > > >>> >> > We are well aware of the CDH distros but at the time we started > with > >>> >> hbase, > >>> >> > there was none that included HBase. I think CDH3 the first one to > >>> include > >>> >> > HBase, correct? And is 0.89 the only one supported? > >>> >> > > >>> >> > Moreover, are we saying that there is no way to prevent stock > hbase > >>> >> 0.20.6 > >>> >> > and hadoop 0.20.2 from losing data when a single node goes down? > It > >>> does > >>> >> not > >>> >> > matter if the data is replicated, it will still get lost? > >>> >> > > >>> >> > -GS > >>> >> > > >>> >> > On Sun, Sep 19, 2010 at 5:58 PM, Todd Lipcon <t...@cloudera.com> > >>> wrote: > >>> >> > > >>> >> >> Hi George, > >>> >> >> > >>> >> >> The data loss problems you mentioned below are known issues when > >>> running > >>> >> on > >>> >> >> stock Apache 0.20.x hadoop. > >>> >> >> > >>> >> >> You should consider upgrading to CDH3b2, which includes a number > of > >>> HDFS > >>> >> >> patches that allow HBase to durably store data. You'll also have > to > >>> >> upgrade > >>> >> >> to HBase 0.89 - we ship a version as part of CDH that will work > well. > >>> >> >> > >>> >> >> Thanks > >>> >> >> -Todd > >>> >> >> > >>> >> >> On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis < > >>> >> gstat...@traackr.com > >>> >> >> >wrote: > >>> >> >> > >>> >> >> > Hi folks. I'd like to run the following data loss scenario by > you > >>> to > >>> >> see > >>> >> >> if > >>> >> >> > we are doing something obviously wrong with our setup here. > >>> >> >> > > >>> >> >> > Setup: > >>> >> >> > > >>> >> >> > - Hadoop 0.20.1 > >>> >> >> > - HBase 0.20.3 > >>> >> >> > - 1 Master Node running Nameserver, SecondaryNameserver, > >>> JobTracker, > >>> >> >> > HMaster and 1 Zookeeper (no zookeeper quorum right now) > >>> >> >> > - 4 child nodes running a Datanode, TaskTracker and > RegionServer > >>> >> each > >>> >> >> > - dfs.replication is set to 2 > >>> >> >> > - Host: Amazon EC2 > >>> >> >> > > >>> >> >> > Up until yesterday, we were frequently experiencing > >>> >> >> > HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>, > >>> >> >> > which kept bringing our RegionServers down. What we realized > though > >>> is > >>> >> >> that > >>> >> >> > we were losing data (a few hours worth) with just one out of > four > >>> >> >> > regionservers going down. This is problematic since we are > supposed > >>> to > >>> >> >> > replicate at x2 out of 4 nodes, so at least one other node > should > >>> be > >>> >> able > >>> >> >> > to > >>> >> >> > theoretically serve the data that the downed regionserver > can't. > >>> >> >> > > >>> >> >> > Questions: > >>> >> >> > > >>> >> >> > - When a regionserver goes down unexpectedly, the only data > that > >>> >> >> > theoretically gets lost was whatever didn't make it to the > WAL, > >>> >> right? > >>> >> >> Or > >>> >> >> > wrong? E.g. > >>> >> >> > > >>> >> >> > > >>> >> >> > >>> >> > >>> > http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html > >>> >> >> > - We ran a hadoop fsck on our cluster and verified the > >>> replication > >>> >> >> factor > >>> >> >> > as well as that the were no under replicated blocks. So why > was > >>> our > >>> >> >> data > >>> >> >> > not > >>> >> >> > available from another node? > >>> >> >> > - If the log gets rolled every 60 minutes by default (we > haven't > >>> >> >> touched > >>> >> >> > the defaults), how can we lose data from up to 24 hours ago? > >>> >> >> > - When the downed regionserver comes back up, shouldn't that > data > >>> be > >>> >> >> > available again? Ours wasn't. > >>> >> >> > - In such scenarios, is there a recommended approach for > >>> restoring > >>> >> the > >>> >> >> > regionserver that goes down? We just brought them back up by > >>> logging > >>> >> on > >>> >> >> > the > >>> >> >> > node itself an manually restarting them first. Now we have > >>> automated > >>> >> >> > crons > >>> >> >> > that listen for their ports and restart them if they go down > >>> within > >>> >> two > >>> >> >> > minutes. > >>> >> >> > - Are there way to recover such lost data? > >>> >> >> > - Are versions 0.89 / 0.90 addressing any of these issues? > >>> >> >> > - Curiosity question: when a regionserver goes down, does the > >>> master > >>> >> >> try > >>> >> >> > to replicate that node's data on another node to satisfy the > >>> >> >> > dfs.replication > >>> >> >> > ratio? > >>> >> >> > > >>> >> >> > For now, we have upgraded our HBase to 0.20.6, which is > supposed to > >>> >> >> contain > >>> >> >> > the HBASE-2077 < > https://issues.apache.org/jira/browse/HBASE-2077> > >>> fix > >>> >> >> (but > >>> >> >> > no one has verified yet). Lars' blog also suggests that Hadoop > >>> 0.21.0 > >>> >> is > >>> >> >> > the > >>> >> >> > way to go to avoid the file append issues but it's not > production > >>> >> ready > >>> >> >> > yet. Should we stick to 0.20.1? Upgrade to 0.20.2? > >>> >> >> > > >>> >> >> > Any tips here are definitely appreciated. I'll be happy to > provide > >>> >> more > >>> >> >> > information as well. > >>> >> >> > > >>> >> >> > -GS > >>> >> >> > > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> -- > >>> >> >> Todd Lipcon > >>> >> >> Software Engineer, Cloudera > >>> >> >> > >>> >> > > >>> >> > >>> > > >>> > >> > > >