Thanks for the info Ted, Anyone tackled this problem before 0.94?
Keith On 7/5/12 2:28 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: >Take a look at HBASE-3584: Allow atomic put/delete in one call >It is in 0.94, meaning it is not even in cdh4 > >Cheers > >On Thu, Jul 5, 2012 at 11:19 AM, Keith Wyss <keith.w...@explorys.com> >wrote: > >> Hi, >> >> My organization has been doing something zany to simulate atomic row >> operations is HBase. >> >> We have a converter-object model for the writables that are populated in >> an HBase table, and one of the governing assumptions >> is that if you are dealing with an Object record, you read all the >>columns >> that compose it out of HBase or a different data source. >> >> When we read lots of data in from a source system that we are trying to >> mirror with HBase, if a column is null that means that whatever is >> in HBase for that column is no longer valid. We have simulated what I >> believe is now called a AtomicRowMutation by using a single Put >> and populating it with blanks. The downside is the wasted space accrued >>by >> the metadata for the blank columns. >> >> Atomicity is not of utmost importance to us, but performance is. My >> approach has been to create a Put and Delete object for a record and >> populate the Delete with the null columns. Then we call >> HTable.batch(List<Row>) on a bunch of these. It is my impression that >>this >> shouldn't appreciably increase network traffic as the RPC calls will be >> bundled. >> >> Has anyone else addressed this problem? Does this seem like a reasonable >> approach? >> What sort of performance overhead should I expect? >> >> Also, I've seen some Jira tickets about making this an atomic operation >>in >> its own right. Is that something that >> I can expect with CDH3U4? >> >> Thanks, >> >> Keith Wyss >>