Hi Jean-Marc, I've validated this, it works perfectly. Very easy to implement and it's very fast!
Thankfully in this project there isn't a lot of lists in each table, so I won't have to create too many column families. In other scenarios it could be a problem. Many thanks, Stas On 16 February 2013 02:29, Jean-Marc Spaggiari <jean-m...@spaggiari.org>wrote: > Hi Stas, > > Few options are coming into my mind. > > Quickly: > 1) Why not storing the products in specif columns instead of in the > same one? Like: > table, rowid1, cf:list, c:aa, value:true > table, rowid1, cf:list, c:bb, value:true > table, rowid1, cf:list, c:cc, value:true > table, rowid2, cf:list, c:aabb, value:true > table, rowid2, cf:list, c:cc, value:true > That way when you do a search you query directly the right column for > the right row. And using "exist" call with also reduce the size of the > data transfered. > > 2) You can store the data in the oposite way. Like: > table, aa, cf:products, c:rowid1, value:true > table, aabb, cf:products, c:rowid2, value:true > table, bb, cf:products, c:rowid1, value:true > table, cc, cf:products, c:rowid1, value:true > table, cc, cf:products, c:rowid2, value:true > Here, you query by your product ID, and you search the column based on > your previous rowid. > > > I will say the 2 solutions are equivalent, but it will really depend > on your data pattern and you query pattern. > > JM > > 2013/2/15, Stas Maksimov <maksi...@gmail.com>: > > Hi all, > > > > I have a requirement to store lists in HBase columns like this: > > "table", "rowid1", "f:list", "aa, bb, cc" > > "table", "rowid2", "f:list", "aabb, cc" > > > > There is a further requirement to be able to find rows where f:list > > contains a particular item, e.g. when I need to find rows having item > "aa" > > only "rowid1" should match, and for item "cc" both "rowid1" and "rowid2" > > should match. > > > > For now I decided to use SingleColumnValueFilter with substring matching. > > As using comma-separated list proved difficult to search through, I'm > using > > pipe symbols to separate items like this: "|aa|bb|cc|", so that I could > > pass the search item surrounded by pipes into the filter: > > SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|') > > > > This proved to work effectively enough, however I would prefer to use > > something more standard for my list storage (e.g. serialised JSON), or > > perhaps something even more optimised for a search - performance really > > does matter here. > > > > Any opinions on this solution and possible enhancements are much > > appreciated. > > > > Many thanks, > > Stas > > >