One minor correction to Andrey's thoughts... All updates to a given row are atomic. Two operations from two different clients against the same row will always be done serially (updates to multiple columns will not be interleaved, one request will go and then the other will go... there is row-level locking). If you are doing a read/modify/write operation, then this is different and to get atomicity there you would need to use something like checkAndPut.
JG > > Is there any querying value in separating out values tied to each > > other vs. keeping them in a serialized object? I am guessing the > > second option would be much faster considering it is one composite > > value on the disk, but I would like to know if there are any specific > > advantages to doing things the other way. Thanks. > > The values themselves are very small, basic information in String. > > > > Eg: > > > > DocInfo: <docId><type> = value1 > > DocInfo: <docId><priority> = value2 > > DocInfo: <docId><etcetc> = value3 > > > > > > Vs > > > > DocInfo: docId = value (JSON(type, priority, etcetc)) > > > > Thank you. > > > > This is mostly depends on usage pattern. > > 1. each value in storage have full key key/family/qualifier/timestamp, > so > keyvalue size increasing > (but this negative effect can be negated by using compression). So > serialisation form will be smaller, take less disk io, and can be > faster. > > 2. second option gives you atomic updates (i.e all data comes as one > "piece") and with first option you > can have concurrent updates of the fields (and of course individual > history, > in opposite to serialized object, which will have history for a whole > object) > > 3. in serialised form you cant use server side filters (out of the box, > you > should patch hbase to support custom filters, which will deserialise > object > or use jsonpath on it's serialised form), but with first option - you > can.
