Hi Jean-Marc,

I've validated this, it works perfectly. Very easy to implement and it's
very fast!

Thankfully in this project there isn't a lot of lists in each table, so I
won't have to create too many column families. In other scenarios it could
be a problem.

Many thanks,
Stas


On 16 February 2013 02:29, Jean-Marc Spaggiari <jean-m...@spaggiari.org>wrote:

> Hi Stas,
>
> Few options are coming into my mind.
>
> Quickly:
> 1) Why not storing the products in specif columns instead of in the
> same one? Like:
> table, rowid1, cf:list, c:aa, value:true
> table, rowid1, cf:list, c:bb, value:true
> table, rowid1, cf:list, c:cc, value:true
> table, rowid2, cf:list, c:aabb, value:true
> table, rowid2, cf:list, c:cc, value:true
> That way when you do a search you query directly the right column for
> the right row. And using "exist" call with also reduce the size of the
> data transfered.
>
> 2) You can store the data in the oposite way. Like:
> table, aa, cf:products, c:rowid1, value:true
> table, aabb, cf:products, c:rowid2, value:true
> table, bb, cf:products, c:rowid1, value:true
> table, cc, cf:products, c:rowid1, value:true
> table, cc, cf:products, c:rowid2, value:true
> Here, you query by your product ID, and you search the column based on
> your previous rowid.
>
>
> I will say the 2 solutions are equivalent, but it will really depend
> on your data pattern and you query pattern.
>
> JM
>
> 2013/2/15, Stas Maksimov <maksi...@gmail.com>:
> > Hi all,
> >
> > I have a requirement to store lists in HBase columns like this:
> > "table", "rowid1", "f:list", "aa, bb, cc"
> > "table", "rowid2", "f:list", "aabb, cc"
> >
> > There is a further requirement to be able to find rows where f:list
> > contains a particular item, e.g. when I need to find rows having item
> "aa"
> > only "rowid1" should match, and for item "cc" both "rowid1" and "rowid2"
> > should match.
> >
> > For now I decided to use SingleColumnValueFilter with substring matching.
> > As using comma-separated list proved difficult to search through, I'm
> using
> > pipe symbols to separate items like this: "|aa|bb|cc|", so that I could
> > pass the search item surrounded by pipes into the filter:
> > SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|')
> >
> > This proved to work effectively enough, however I would prefer to use
> > something more standard for my list storage (e.g. serialised JSON), or
> > perhaps something even more optimised for a search - performance really
> > does matter here.
> >
> > Any opinions on this solution and possible enhancements are much
> > appreciated.
> >
> > Many thanks,
> > Stas
> >
>

Reply via email to