So a Get call with multiple columns on a single row should be much faster than independent Get(s) on each of those columns for that row. I am basically seeing severely poor performance (~ 15 seconds) for certain deleteColumn() calls and I am seeing that there is a prepareDeleteTimestamps() function in HRegion.java which first tries to locate the column by doing individual gets on each column you want to delete (I am doing 300 column deletes). Now, I think this should ideall by 1 get call with the batch of 300 columns so that one scan can retrieve the columns and the columns that are found, are indeed deleted.
Before I try this fix, I wanted to get an opinion if it will make a difference to batch the get() and it seems from your answer, it should. On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <la...@apache.org> wrote: > Everything is stored as a KeyValue in HBase. > The Key part of a KeyValue contains the row key, column family, column > name, and timestamp in that order. > Each column family has it's own store and store files. > > So in a nutshell a get is executed by starting a scan at the row key > (which is a prefix of the key) in each store (CF) and then scanning forward > in each store until the next row key is reached. (in reality it is a bit > more complicated due to multiple versions, skipping columns, etc) > > > -- Lars > ________________________________ > From: Varun Sharma <va...@pinterest.com> > To: user@hbase.apache.org > Sent: Friday, February 8, 2013 9:22 PM > Subject: Re: Get on a row with multiple columns > > Sorry, I was a little unclear with my question. > > Lets say you have > > Get get = new Get(row) > get.addColumn("1"); > get.addColumn("2"); > . > . > . > > When internally hbase executes the batch get, it will seek to column "1", > now since data is lexicographically sorted, it does not need to seek from > the beginning to get to "2", it can continue seeking, henceforth since > column "2" will always be after column "1". I want to know whether this is > how a multicolumn get on a row works or not. > > Thanks > Varun > > On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <mlor...@uci.cu> wrote: > > > Like Ishan said, a get give an instance of the Result class. > > All utility methods that you can use are: > > byte[] getValue(byte[] family, byte[] qualifier) > > byte[] value() > > byte[] getRow() > > int size() > > boolean isEmpty() > > KeyValue[] raw() # Like Ishan said, all data here is sorted > > List<KeyValue> list() > > > > > > > > > > On 02/08/2013 11:29 PM, Ishan Chhabra wrote: > > > >> Based on what I read in Lars' book, a get will return a result a Result, > >> which is internally a KeyValue[]. This KeyValue[] is sorted by the key > and > >> you access this array using raw or list methods on the Result object. > >> > >> > >> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma <va...@pinterest.com> > wrote: > >> > >> +user > >>> > >>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma <va...@pinterest.com> > >>> wrote: > >>> > >>> Hi, > >>>> > >>>> When I do a Get on a row with multiple column qualifiers. Do we sort > the > >>>> column qualifers and make use of the sorted order when we get the > >>>> > >>> results ? > >>> > >>>> Thanks > >>>> Varun > >>>> > >>>> > >> > >> > > -- > > Marcos Ortiz Valmaseda, > > Product Manager && Data Scientist at UCI > > Blog: http://marcosluis2186.**posterous.com< > http://marcosluis2186.posterous.com> > > Twitter: @marcosluis2186 <http://twitter.com/**marcosluis2186< > http://twitter.com/marcosluis2186> > > > > > >