Aha. That makes sense (both atomic writes and Filters).

I am definitely only looking to filter within a given user, so looks
like what you describe below might work for me.

Thanks so much for all your help, Jonathan. You have saved me (at
least) 2 weeks of tinkering and poking around!

On Mon, Jun 21, 2010 at 5:10 PM, Jonathan Gray <[email protected]> wrote:
> It would be inefficient to run that query against this schema, if you're 
> talking about finding all documents with a given author across all users.  In 
> that case you'd want to use an additional table that had row keys as authors.
>
> If you want to search for documents with a specific author within a given 
> users documents (single row) then you could use filters, and as Andrey said, 
> it would be simpler if it was broken up into individual qualifiers but could 
> also be done with a custom filter to read the serialized value.
>
> To answer your question, you'd want a QualifierFilter that matched against 
> qualifiers of the form <anylong><author> and then a ValueFilter which matched 
> the value against the specific author you're looking for.
>
> JG
>
>> -----Original Message-----
>> From: N Kapshoo [mailto:[email protected]]
>> Sent: Monday, June 21, 2010 2:59 PM
>> To: [email protected]
>> Subject: Re: composite value vs composite qualifier
>>
>> I am not sure how to use filters in my case since I do not know the
>> column name.
>> Eg:
>> DocInfo: 123213+author = "abc"
>>
>> 123213 is the docId. If I want to look for authors named 'abc' in all
>> docs, how would I go about specifying a filter?
>>
>> Thanks.
>>
>> On Mon, Jun 21, 2010 at 4:20 PM, Andrey Stepachev <[email protected]>
>> wrote:
>> > 2010/6/22 N Kapshoo <[email protected]>
>> >
>> >> Is there any querying value in separating out values tied to each
>> >> other vs. keeping them in a serialized object? I am guessing the
>> >> second option would be much faster considering it is one composite
>> >> value on the disk, but I would like to know if there are any
>> specific
>> >> advantages to doing things the other way. Thanks.
>> >> The values themselves are very small, basic information in String.
>> >>
>> >> Eg:
>> >>
>> >> DocInfo: <docId><type> = value1
>> >> DocInfo: <docId><priority> = value2
>> >> DocInfo: <docId><etcetc> = value3
>> >>
>> >>
>> >> Vs
>> >>
>> >> DocInfo: docId = value (JSON(type, priority, etcetc))
>> >>
>> >> Thank you.
>> >>
>> >
>> > This is mostly depends on usage pattern.
>> >
>> > 1. each value in storage have full key
>> key/family/qualifier/timestamp, so
>> > keyvalue size increasing
>> > (but this negative effect can be negated by using compression). So
>> > serialisation form will be smaller, take less disk io, and can be
>> faster.
>> >
>> > 2. second option gives you atomic updates (i.e all data comes as one
>> > "piece") and with first option you
>> > can have concurrent updates of the fields (and of course individual
>> history,
>> > in opposite to serialized object, which will have history for a whole
>> > object)
>> >
>> > 3. in serialised form you cant use server side filters (out of the
>> box, you
>> > should patch hbase to support custom filters, which will deserialise
>> object
>> > or use jsonpath on it's serialised form), but with first option - you
>> can.
>> >
>

Reply via email to