I am not sure how to use filters in my case since I do not know the column name. Eg: DocInfo: 123213+author = "abc"
123213 is the docId. If I want to look for authors named 'abc' in all docs, how would I go about specifying a filter? Thanks. On Mon, Jun 21, 2010 at 4:20 PM, Andrey Stepachev <[email protected]> wrote: > 2010/6/22 N Kapshoo <[email protected]> > >> Is there any querying value in separating out values tied to each >> other vs. keeping them in a serialized object? I am guessing the >> second option would be much faster considering it is one composite >> value on the disk, but I would like to know if there are any specific >> advantages to doing things the other way. Thanks. >> The values themselves are very small, basic information in String. >> >> Eg: >> >> DocInfo: <docId><type> = value1 >> DocInfo: <docId><priority> = value2 >> DocInfo: <docId><etcetc> = value3 >> >> >> Vs >> >> DocInfo: docId = value (JSON(type, priority, etcetc)) >> >> Thank you. >> > > This is mostly depends on usage pattern. > > 1. each value in storage have full key key/family/qualifier/timestamp, so > keyvalue size increasing > (but this negative effect can be negated by using compression). So > serialisation form will be smaller, take less disk io, and can be faster. > > 2. second option gives you atomic updates (i.e all data comes as one > "piece") and with first option you > can have concurrent updates of the fields (and of course individual history, > in opposite to serialized object, which will have history for a whole > object) > > 3. in serialised form you cant use server side filters (out of the box, you > should patch hbase to support custom filters, which will deserialise object > or use jsonpath on it's serialised form), but with first option - you can. >
