Thanks Wilm for the wonderful answer. I really like your recommendation numbered 2 of having two tables with the rowkey of jobs inside source row.
I am curious to know if a hybrid of approaches 2 and 3 could be used. This means having the rowkeys of jobs inside source row like in approach 2. And having the parent source rowkey as a column in job row as in approach 3. Now If I want to access all jobs for a particular source, I have the below mentioned options. 1. Get all job rowkeys from source row, and then use them to fetch the jobs. This would be a direct getRows operation and as the rows in HBase are indexed by rowkey, I expect this to be fast. But in the process I make two calls to two separate tables. 2. As I also have source rowkey in each job row as a column, I can use a filter to get all jobs in a single scan of the table but this column is not indexed and hence the whole table will be scanned naively if I am correct. Now, in terms of response time, which of these methods will be faster. Also, I didn't get the idea behind 'if at some point a reevaluation of a source has to be done, you could simply use a row lock to prevent race conditions'. An elaboration of this would be great! Thanks! Jatin -- View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-entity-relationship-tp4066296p4066374.html Sent from the HBase User mailing list archive at Nabble.com.