Re: How HBase perform per-column scan?

Anoop John Sun, 10 Mar 2013 08:54:18 -0700

As per the above said, you will need a full table scan on that CF.
As Ted said, consider having a look at your schema design.


-Anoop-


On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu <[email protected]> wrote:

> bq. physically column family should be able to perform efficiently (storage
> layer
>
> When you scan a row, data for different column families would be brought
> into memory (if you don't utilize HBASE-5416)
> Take a look at:
>
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541258
>
> which was based on the settings described in:
>
>
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541191
>
> This boils down to your schema design. If possible, consider extracting
> column C into its own column family.
>
> Cheers
>
> On Sun, Mar 10, 2013 at 7:14 AM, PG <[email protected]> wrote:
>
> > Hi, Ted and Anoop, thanks for your notes.
> > I am talking about column rather than column family, since physically
> > column family should be able to perform efficiently (storage layer, CF's
> > are stored separately). But columns of the same column family may be
> mixed
> > physically, and that makes filters column value hard... So I want to know
> > if there are any mechanism in HBase worked on this...
> > Regards,
> > Yun
> >
> > On Mar 10, 2013, at 10:01 AM, Ted Yu <[email protected]> wrote:
> >
> > > Hi, Yun:
> > > Take a look at HBASE-5416 (Improve performance of scans with some kind
> of
> > > filters) which is in 0.94.5 release.
> > >
> > > In your case, you can use a filter which specifies column C as the
> > > essential family.
> > > Here I interpret column C as column family.
> > >
> > > Cheers
> > >
> > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng <[email protected]>
> wrote:
> > >
> > >> Hi, All,
> > >> I want to find all existing values for a given column in a HBase, and
> > would
> > >> that result in a full-table scan in HBase? For example, given a column
> > C,
> > >> the table is of very large number of rows, from which few rows (say
> > only 1
> > >> row) have non-empty values for column C. Would HBase still ues a full
> > table
> > >> scan to find this row? Or HBase has any optimization work for this
> kind
> > of
> > >> query?
> > >> Thanks...
> > >> Regards
> > >> Yun
> > >>
> >
>

Re: How HBase perform per-column scan?

Reply via email to