Which HBase version are you using ? Is there a way to place 10 delete markers from application side instead of 300 ?
Thanks On Fri, Feb 8, 2013 at 10:05 PM, Varun Sharma <va...@pinterest.com> wrote: > We are given a set of 300 columns to delete. I tested two cases: > > 1) deleteColumns() - with the 's' > > This function simply adds delete markers for 300 columns, in our case, > typically only a fraction of these columns are actually present - 10. After > starting to use deleteColumns, we starting seeing a drop in cluster wide > random read performance - 90th percentile latency worsened, so did 99th > probably because of having to traverse delete markers. I attribute this to > profusion of delete markers in the cluster. Major compactions slowed down > by almost 50 percent probably because of having to clean out significantly > more delete markers. > > 2) deleteColumn() > > Ended up with untolerable 15 second calls, which clogged all the handlers. > Making the cluster pretty much unresponsive. > > On Fri, Feb 8, 2013 at 9:55 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > For the 300 column deletes, can you show us how the Delete(s) are > > constructed ? > > > > Do you use this method ? > > > > public Delete deleteColumns(byte [] family, byte [] qualifier) { > > Thanks > > > > On Fri, Feb 8, 2013 at 9:44 PM, Varun Sharma <va...@pinterest.com> > wrote: > > > > > So a Get call with multiple columns on a single row should be much > faster > > > than independent Get(s) on each of those columns for that row. I am > > > basically seeing severely poor performance (~ 15 seconds) for certain > > > deleteColumn() calls and I am seeing that there is a > > > prepareDeleteTimestamps() function in HRegion.java which first tries to > > > locate the column by doing individual gets on each column you want to > > > delete (I am doing 300 column deletes). Now, I think this should ideall > > by > > > 1 get call with the batch of 300 columns so that one scan can retrieve > > the > > > columns and the columns that are found, are indeed deleted. > > > > > > Before I try this fix, I wanted to get an opinion if it will make a > > > difference to batch the get() and it seems from your answer, it should. > > > > > > On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <la...@apache.org> > wrote: > > > > > > > Everything is stored as a KeyValue in HBase. > > > > The Key part of a KeyValue contains the row key, column family, > column > > > > name, and timestamp in that order. > > > > Each column family has it's own store and store files. > > > > > > > > So in a nutshell a get is executed by starting a scan at the row key > > > > (which is a prefix of the key) in each store (CF) and then scanning > > > forward > > > > in each store until the next row key is reached. (in reality it is a > > bit > > > > more complicated due to multiple versions, skipping columns, etc) > > > > > > > > > > > > -- Lars > > > > ________________________________ > > > > From: Varun Sharma <va...@pinterest.com> > > > > To: user@hbase.apache.org > > > > Sent: Friday, February 8, 2013 9:22 PM > > > > Subject: Re: Get on a row with multiple columns > > > > > > > > Sorry, I was a little unclear with my question. > > > > > > > > Lets say you have > > > > > > > > Get get = new Get(row) > > > > get.addColumn("1"); > > > > get.addColumn("2"); > > > > . > > > > . > > > > . > > > > > > > > When internally hbase executes the batch get, it will seek to column > > "1", > > > > now since data is lexicographically sorted, it does not need to seek > > from > > > > the beginning to get to "2", it can continue seeking, henceforth > since > > > > column "2" will always be after column "1". I want to know whether > this > > > is > > > > how a multicolumn get on a row works or not. > > > > > > > > Thanks > > > > Varun > > > > > > > > On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <mlor...@uci.cu> wrote: > > > > > > > > > Like Ishan said, a get give an instance of the Result class. > > > > > All utility methods that you can use are: > > > > > byte[] getValue(byte[] family, byte[] qualifier) > > > > > byte[] value() > > > > > byte[] getRow() > > > > > int size() > > > > > boolean isEmpty() > > > > > KeyValue[] raw() # Like Ishan said, all data here is sorted > > > > > List<KeyValue> list() > > > > > > > > > > > > > > > > > > > > > > > > > On 02/08/2013 11:29 PM, Ishan Chhabra wrote: > > > > > > > > > >> Based on what I read in Lars' book, a get will return a result a > > > Result, > > > > >> which is internally a KeyValue[]. This KeyValue[] is sorted by the > > key > > > > and > > > > >> you access this array using raw or list methods on the Result > > object. > > > > >> > > > > >> > > > > >> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma <va...@pinterest.com > > > > > > wrote: > > > > >> > > > > >> +user > > > > >>> > > > > >>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma < > va...@pinterest.com> > > > > >>> wrote: > > > > >>> > > > > >>> Hi, > > > > >>>> > > > > >>>> When I do a Get on a row with multiple column qualifiers. Do we > > sort > > > > the > > > > >>>> column qualifers and make use of the sorted order when we get > the > > > > >>>> > > > > >>> results ? > > > > >>> > > > > >>>> Thanks > > > > >>>> Varun > > > > >>>> > > > > >>>> > > > > >> > > > > >> > > > > > -- > > > > > Marcos Ortiz Valmaseda, > > > > > Product Manager && Data Scientist at UCI > > > > > Blog: http://marcosluis2186.**posterous.com< > > > > http://marcosluis2186.posterous.com> > > > > > Twitter: @marcosluis2186 <http://twitter.com/**marcosluis2186< > > > > http://twitter.com/marcosluis2186> > > > > > > > > > > > > > > > > > > > > >