That person should have been Lars, I think. On Tue, Oct 2, 2012 at 7:04 PM, Alex Baranau <alex.barano...@gmail.com>wrote:
> > Currently HRegion.mutateRowsWithLocks actually acquires > > locks on all rows first (since the contract here is a transaction), > > so (currently) you would get unnecessarily reduced concurrency > > using that API for changes that do not need to be atomic. > > Right, it's about "unnecessarily reduced concurrency" vs "faster writing > edits to WAL". In case the changes you write do not intersect (do not > belong to the same row), which I imagine is the most common case when using > HBase, then it makes sense to choose faster writing to WAL. > > > Also note that a Put(List<Put>) operation already writes multiple > > updates to a single WALEdit (doing a best effort batching). > > Do you mean HTable.put(List<Put>) operation? Really? Hm.. Oh, you probably > mean that updates *that belong to the same row* are getting written to WAL > as single WALEdit. Yeah, that was a great improvement (esp. w.r.t. to > consistency). > > If there are no objections, I'd add this idea of "faster writing edits to > WAL" by putting more updates of multiple rows into single WALEdit (which > essentially is WAL write transaction) into JIRA. > > Would be great to hear J-D's thoughts: if I remember correctly, he > mentioned that he tried to do FS sync() on each write to WAL (to ensure > "real durability"). Again, if I remember correctly this brought quite a lot > of overhead... which can be reduced by bigger writes to WAL. Or may be it > wasn't J-D who talked about it on the hackathon after HBaseCon? > > Alex > > On Tue, Oct 2, 2012 at 8:20 PM, lars hofhansl <lhofha...@yahoo.com> wrote: > > > This is an interesting observation. I have not thought about HBASE-5229 > in > > terms of a performance improvement. > > Currently HRegion.mutateRowsWithLocks actually acquires locks on all rows > > first (since the contract here is a transaction), so (currently) you > would > > get unnecessarily reduced concurrency using that API for changes that do > > not need to be atomic. > > > > > > Also note that a Put(List<Put>) operation already writes multiple updates > > to a single WALEdit (doing a best effort batching). > > > > -- Lars > > > > > > > > ________________________________ > > From: Alex Baranau <alex.barano...@gmail.com> > > To: user@hbase.apache.org > > Sent: Tuesday, October 2, 2012 4:29 PM > > Subject: HBase: "small" WAL transactions Q > > > > Hello, > > > > May be silly question. > > > > Data in WAL is written in small transactions. One transaction is a set of > > KeyValues for specific (single) row. As we want each written transaction > to > > be durable we write them into the WAL one-by-one (ideally with FS sync() > > calls, etc. on each write). Which is very costly (doing that for each > > write). > > > > Having bigger WAL transactions (writing changes to several "close" > records) > > should be more efficient (would result in increase of write throughput). > > I.e. WALEdit record would contain updates to the multiple different rows. > > As far as I understand smth like that was implemented in HBASE-5229 [1]. > > But it is not a default behavior when sending multiple records changes to > > RS (e.g. when flushing client-side buffer). It also cannot be forced. > What > > are the major reasons for not using that? Is locking multiple "close" > rows > > looks so dangerous? Or is it simply not efficient (there's more to that > > besides what I described above)? > > > > Thank you, > > Alex Baranau > > ------ > > Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - > Solr > > > > [1] https://issues.apache.org/jira/browse/HBASE-5229 > > >