Re: Hbase performance with HDFS

2011-07-11 Thread Ted Dunning
Hardly voodoo, but also not something that can be done casually. You need strong transactional guarantees from the file system layer to do this. And yes, it does come down to something like groups of group commits. It didn't require patching the layer below dfsclient so much as correct and caref

Re: Hbase performance with HDFS

2011-07-11 Thread Stack
Ted, you seem to be describing voodoo? Are you talking of a group commit of the group commits? Bigger batches at the layer below dfsclient? St.Ack On Mon, Jul 11, 2011 at 11:57 AM, Ted Dunning wrote: > On Mon, Jul 11, 2011 at 11:22 AM, Joey Echeverria wrote: > >> On Mon, Jul 11, 2011 at 12:47

Re: Hbase performance with HDFS

2011-07-11 Thread Ted Dunning
No, the semantics do not change. On Mon, Jul 11, 2011 at 12:37 PM, Joey Echeverria wrote: > > :-) > > > > No changes were required in HBase to enable this. > > Do the semantics of sync change? Do you pause one or more outstanding > syncs, sync a group of data (4KB maybe) and then return from all

Re: Hbase performance with HDFS

2011-07-11 Thread Luke Lu
On Mon, Jul 11, 2011 at 12:37 PM, Joey Echeverria wrote: > Do the semantics of sync change? Do you pause one or more outstanding > syncs, sync a group of data (4KB maybe) and then return from all of > those outstanding syncs simultaneously? Group commit is a standard storage technique to trade a

Re: Hbase performance with HDFS

2011-07-11 Thread M. C. Srivas
On Mon, Jul 11, 2011 at 12:37 PM, Joey Echeverria wrote: > > :-) > > > > No changes were required in HBase to enable this. > > Do the semantics of sync change? Do you pause one or more outstanding > syncs, sync a group of data (4KB maybe) and then return from all of > those outstanding syncs simu

Re: Hbase performance with HDFS

2011-07-11 Thread Joey Echeverria
> :-) > > No changes were required in HBase to enable this. Do the semantics of sync change? Do you pause one or more outstanding syncs, sync a group of data (4KB maybe) and then return from all of those outstanding syncs simultaneously? -Joey -- Joseph Echeverria Cloudera, Inc. 443.305.9434

Re: Hbase performance with HDFS

2011-07-11 Thread Ted Dunning
On Mon, Jul 11, 2011 at 11:22 AM, Joey Echeverria wrote: > On Mon, Jul 11, 2011 at 12:47 PM, Ted Dunning > wrote: > > Also, on MapR, you get another level of group commit above the row level. > > That takes the writes even further from the byte by byte level. > > Is this done with an HBASE patc

Re: Hbase performance with HDFS

2011-07-11 Thread Joey Echeverria
On Mon, Jul 11, 2011 at 12:47 PM, Ted Dunning wrote: > Also, on MapR, you get another level of group commit above the row level. >  That takes the writes even further from the byte by byte level. Is this done with an HBASE patch? I don't see how this could be done merely at the FS layer. -Joey

Re: Hbase performance with HDFS

2011-07-11 Thread Ted Dunning
t; (via Tom White) > > > - Original Message - > > From: Arvind Jayaprakash > > To: user@hbase.apache.org; Andrew Purtell > > Cc: > > Sent: Monday, July 11, 2011 6:34 AM > > Subject: Re: Hbase performance with HDFS > > > > On Jul 07, Andrew Purtell wrote

Re: Hbase performance with HDFS

2011-07-11 Thread Andrew Purtell
White) - Original Message - > From: Arvind Jayaprakash > To: user@hbase.apache.org; Andrew Purtell > Cc: > Sent: Monday, July 11, 2011 6:34 AM > Subject: Re: Hbase performance with HDFS > > On Jul 07, Andrew Purtell wrote: >>> Since HDFS is mostly write o

Re: Hbase performance with HDFS

2011-07-11 Thread Arvind Jayaprakash
On Jul 07, Andrew Purtell wrote: >> Since HDFS is mostly write once how are updates/deletes handled? > >Not mostly, only write once. > >Deletes are just another write, but one that writes tombstones >"covering" data with older timestamps.  > >When serving queries, HBase searches store files back in

Re: Hbase performance with HDFS

2011-07-07 Thread Mohit Anchlia
h by hitting back. - Piet Hein (via > Tom White) > > > - Original Message ----- >> From: Mohit Anchlia >> To: user@hbase.apache.org >> Cc: >> Sent: Thursday, July 7, 2011 3:02 PM >> Subject: Re: Hbase performance with HDFS >> >>T hanks! I u

Re: Hbase performance with HDFS

2011-07-07 Thread Andrew Purtell
rthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message - > From: Mohit Anchlia > To: user@hbase.apache.org > Cc: > Sent: Thursday, July 7, 2011 3:02 PM > Subject: Re: Hbase performance with HDFS > >T hanks! I understand

Re: Hbase performance with HDFS

2011-07-07 Thread Mohit Anchlia
ation has already taken place. > > -Original Message- > From: Mohit Anchlia [mailto:mohitanch...@gmail.com] > Sent: Thursday, July 07, 2011 2:02 PM > To: user@hbase.apache.org; Andrew Purtell > Subject: Re: Hbase performance with HDFS > > Thanks Andrew. Really helpful.

RE: Hbase performance with HDFS

2011-07-07 Thread Buttler, David
, July 07, 2011 2:02 PM To: user@hbase.apache.org; Andrew Purtell Subject: Re: Hbase performance with HDFS Thanks Andrew. Really helpful. I think I have one more question right now :) Underneath HDFS replicates blocks by default 3. Not sure how it relates to HFile and compactions. When compaction

Re: Hbase performance with HDFS

2011-07-07 Thread Mohit Anchlia
attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > >> >>From: Mohit Anchlia >>To: Andrew Purtell >>Cc: "user@hbase.apache.org" >>Sent: Thursday, July 7, 2011 12:30 PM >>Subject: Re: Hb

Re: Hbase performance with HDFS

2011-07-07 Thread Andrew Purtell
ack prove their worth by hitting back. - Piet Hein (via Tom White) > >From: Mohit Anchlia >To: Andrew Purtell >Cc: "user@hbase.apache.org" >Sent: Thursday, July 7, 2011 12:30 PM >Subject: Re: Hbase performance with HDFS > >Thanks t

Re: Hbase performance with HDFS

2011-07-07 Thread Doug Meil
are compacted as >> needed, because as you point out GFS and HDFS are optimized for >>streaming >> sequential reads and writes. >> >> Best regards, >> >> - Andy >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> _

Re: Hbase performance with HDFS

2011-07-07 Thread Mohit Anchlia
Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > From: Mohit Anchlia > To: user@hbase.apache.org; Andrew Purtell > Sent: Thursday, July 7, 2011 11:53 AM > Subject: Re: Hbase performance with HDFS > > I have looked at big

Re: Hbase performance with HDFS

2011-07-07 Thread Doug Meil
back. - Piet >>Hein (via Tom White) >> >> >>>________ >>>From: Mohit Anchlia >>>To: user@hbase.apache.org >>>Sent: Thursday, July 7, 2011 11:12 AM >>>Subject: Hbase performance with HDFS >>> >>>I

Re: Hbase performance with HDFS

2011-07-07 Thread Andrew Purtell
>> Best regards, >> >> >>     - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> >>>________ >>>From: Mohit Anchlia >>>To: user@

Re: Hbase performance with HDFS

2011-07-07 Thread Stack
gt;> > >>> > Start here: http://labs.google.com/papers/bigtable.html >>> > >>> > Best regards, >>> > >>> > >>> >     - Andy >>> > >>> > Problems worthy of attack prove their worth by hitting back. -

Re: Hbase performance with HDFS

2011-07-07 Thread Mohit Anchlia
t; >> > Best regards, >> > >> > >> >     - Andy >> > >> > Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> > >> > >> >> >> >

Re: Hbase performance with HDFS

2011-07-07 Thread Himanshu Vashishtha
. - Piet Hein > (via Tom White) > > > > > >>________ > >>From: Mohit Anchlia > >>To: user@hbase.apache.org > >>Sent: Thursday, July 7, 2011 11:12 AM > >>Subject: Hbase performance with HDFS > >> > >&g

Re: Hbase performance with HDFS

2011-07-07 Thread Mohit Anchlia
attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > >> >>From: Mohit Anchlia >>To: user@hbase.apache.org >>Sent: Thursday, July 7, 2011 11:12 AM >>Subject: Hbase performance with HDFS >> >

Re: Hbase performance with HDFS

2011-07-07 Thread Andrew Purtell
sday, July 7, 2011 11:12 AM >Subject: Hbase performance with HDFS > >I've been trying to understand how Hbase can provide good performance >using HDFS when purpose of HDFS is sequential large block sizes which >is inherently different than of Hbase where it's more random and ro

Hbase performance with HDFS

2011-07-07 Thread Mohit Anchlia
I've been trying to understand how Hbase can provide good performance using HDFS when purpose of HDFS is sequential large block sizes which is inherently different than of Hbase where it's more random and row sizes might be very small. I am reading this but doesn't answer my question. It does say