It would be inefficient to run that query against this schema, if you're 
talking about finding all documents with a given author across all users.  In 
that case you'd want to use an additional table that had row keys as authors.

If you want to search for documents with a specific author within a given users 
documents (single row) then you could use filters, and as Andrey said, it would 
be simpler if it was broken up into individual qualifiers but could also be 
done with a custom filter to read the serialized value.

To answer your question, you'd want a QualifierFilter that matched against 
qualifiers of the form <anylong><author> and then a ValueFilter which matched 
the value against the specific author you're looking for.

JG

> -----Original Message-----
> From: N Kapshoo [mailto:[email protected]]
> Sent: Monday, June 21, 2010 2:59 PM
> To: [email protected]
> Subject: Re: composite value vs composite qualifier
> 
> I am not sure how to use filters in my case since I do not know the
> column name.
> Eg:
> DocInfo: 123213+author = "abc"
> 
> 123213 is the docId. If I want to look for authors named 'abc' in all
> docs, how would I go about specifying a filter?
> 
> Thanks.
> 
> On Mon, Jun 21, 2010 at 4:20 PM, Andrey Stepachev <[email protected]>
> wrote:
> > 2010/6/22 N Kapshoo <[email protected]>
> >
> >> Is there any querying value in separating out values tied to each
> >> other vs. keeping them in a serialized object? I am guessing the
> >> second option would be much faster considering it is one composite
> >> value on the disk, but I would like to know if there are any
> specific
> >> advantages to doing things the other way. Thanks.
> >> The values themselves are very small, basic information in String.
> >>
> >> Eg:
> >>
> >> DocInfo: <docId><type> = value1
> >> DocInfo: <docId><priority> = value2
> >> DocInfo: <docId><etcetc> = value3
> >>
> >>
> >> Vs
> >>
> >> DocInfo: docId = value (JSON(type, priority, etcetc))
> >>
> >> Thank you.
> >>
> >
> > This is mostly depends on usage pattern.
> >
> > 1. each value in storage have full key
> key/family/qualifier/timestamp, so
> > keyvalue size increasing
> > (but this negative effect can be negated by using compression). So
> > serialisation form will be smaller, take less disk io, and can be
> faster.
> >
> > 2. second option gives you atomic updates (i.e all data comes as one
> > "piece") and with first option you
> > can have concurrent updates of the fields (and of course individual
> history,
> > in opposite to serialized object, which will have history for a whole
> > object)
> >
> > 3. in serialised form you cant use server side filters (out of the
> box, you
> > should patch hbase to support custom filters, which will deserialise
> object
> > or use jsonpath on it's serialised form), but with first option - you
> can.
> >

Reply via email to