I such case I think that you can use tall tables with parent:child keys and
filters or range scans to get childrens.

You queries will be:
-Fetch all children from a single parent

scan [parent:0, parent+1:0)

-Find a few children by their keys or values from a single parent

scan [parent:min_of_child_keys, parent:max_of_child_key + 1] + filterset (or
custom hash filter)
If it is too many keys, you can use HTable.getRegionLocation to split you
childs
by parallel scans on different regions.

-Update a single child by child key and it's parent key

easy (in all cases, simpe put or get+put if it is true update, not
overwrite)


2011/2/11 Jason <urg...@gmail.com>

> Hi all,
>
> Let's say I have two entities Parent and Child. There could be many
> children in one parent (from hundreds to tens of millions)
> A child can only belong to one Parent.
>
> Typical queries are:
> -Fetch all children from a single parent
> -Find a few children by their keys or values from a single parent
> -Update a single child by child key and it's parent key
>
> And there are no cross-parent queries.
>
> I am trying to figure out what is better schema approach from
> performance/maintenance perspective:
>
> 1. One table with one Parent per row. Row key is a parent id. Children are
> stored in a single family each under separate qualifier (child id). Would it
> even work assuming all children may not fit in memory?
>
> 2. One table. Compound row key parent id/child id. One child per row.
>
> 3. Many tables - one per parent. Row key is a child id.
>
> Thanks!

Reply via email to