Thanks Wilm for the wonderful answer. I really like your recommendation
numbered 2 of having two tables with the rowkey of jobs inside source row.

I am curious to know if a hybrid of approaches 2 and 3 could be used. This
means having the rowkeys of jobs inside source row like in approach 2. And
having the parent source rowkey as a column in job row as in approach 3. 

Now If I want to access all jobs for a particular source, I have the below
mentioned options.

1. Get all job rowkeys from source row, and then use them to fetch the jobs.
This would be a direct getRows operation and as the rows in HBase are
indexed by rowkey, I expect this to be fast. But in the process I make two
calls to two separate tables.

2.  As I also have source rowkey in each job row as a column, I can use a
filter to get all jobs in a single scan of the table but this column is not
indexed and hence the whole table will be scanned naively if I am correct. 

Now, in terms of response time, which of these methods will be faster.

Also, I didn't get the idea behind 'if at some point a reevaluation of a
source has to be done, you could simply use a row lock to prevent race
conditions'. An elaboration of this would be great!

Thanks!
Jatin





--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/HBase-entity-relationship-tp4066296p4066374.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to