I such case I think that you can use tall tables with parent:child keys and filters or range scans to get childrens.
You queries will be: -Fetch all children from a single parent scan [parent:0, parent+1:0) -Find a few children by their keys or values from a single parent scan [parent:min_of_child_keys, parent:max_of_child_key + 1] + filterset (or custom hash filter) If it is too many keys, you can use HTable.getRegionLocation to split you childs by parallel scans on different regions. -Update a single child by child key and it's parent key easy (in all cases, simpe put or get+put if it is true update, not overwrite) 2011/2/11 Jason <urg...@gmail.com> > Hi all, > > Let's say I have two entities Parent and Child. There could be many > children in one parent (from hundreds to tens of millions) > A child can only belong to one Parent. > > Typical queries are: > -Fetch all children from a single parent > -Find a few children by their keys or values from a single parent > -Update a single child by child key and it's parent key > > And there are no cross-parent queries. > > I am trying to figure out what is better schema approach from > performance/maintenance perspective: > > 1. One table with one Parent per row. Row key is a parent id. Children are > stored in a single family each under separate qualifier (child id). Would it > even work assuming all children may not fit in memory? > > 2. One table. Compound row key parent id/child id. One child per row. > > 3. Many tables - one per parent. Row key is a child id. > > Thanks!