Unless I am mistaken... get() requires a row key, right? And you can join tables on column data which isn't in the row key, right?
So how do you do a get()? :-) Sure there is more than one way to skin a cat. But if you want to be efficient... You will create a set of unique keys based on the columns that you want to join. Note that if you are going to use a temp table in hbase, you will want to store the unique key value A|B and when you write the row to the temp table, you will append an unique identifier like a uuid so that you don't lose the row. Here your input list to the actual join is going to be the list of unique keys and then you do a scan to get the rows. Again, I could be wrong but how can you perform a get() when you only know a portion of the row key? Sent from a remote device. Please excuse any typos... Mike Segel On Jun 8, 2011, at 8:01 AM, Doug Meil <doug.m...@explorysmedical.com> wrote: > > Re: " With respect to Doug's posts, you can't do a multi-get off the bat" > > That's an assumption, but you're entitled to your opinion. > > -----Original Message----- > From: Michael Segel [mailto:michael_se...@hotmail.com] > Sent: Monday, June 06, 2011 10:08 PM > To: user@hbase.apache.org > Subject: RE: How to efficiently join HBase tables? > > > Well.... > > David, is correct. > > Eran wanted to do a join which is a relational concept that isn't natively > supported by a NoSQL database. A better model would be a hierarchical model > like Dick Pick's Revelation. (Univers aka U2 from Ardent/Informix/IBM/now > JRockit?). > And yes, we're looking back 40 some odd years in to either a merge/sort > solution or how databases do a relational join. :-) > > Eran wants to do this in a single m/r job. The short answer is you can't. > Longer answer is that if your main class implements Tool Runner, you can > launch two jobs in parallel to get your subsets, and then when they both > complete, you run the join job on them. So I guess its a single 'job' or > rather app. :-) > > With respect to Doug's posts, you can't do a multi-get off the bat because in > the general case you're not fetching based on the row key but a column which > is not part of the row key. (It could be a foreign key which would mean that > at least one of your table fetches will be off the row key but you can't > guarantee it.) > > So if you don't want to use temp tables, then you have to put your results in > a sorted order, and you still want to get the unique set of the join-keys > which means you have to run a reduce job. Then you can use the unique key set > and then do the scans. (You can't do a multi-get because you're doing a scan > with a start and stop row(s).) > > The reason I suggest that if you're going to do a join operation, you want to > use temp tables because it makes your life easier and probably faster too. > > Bottom line... I guess many data architects are going to need rethink their > data models when working on big data. :-) > > -Mike > > PS. If I get a spare moment, I may code this up... > > >> From: doug.m...@explorysmedical.com >> To: user@hbase.apache.org >> Date: Mon, 6 Jun 2011 17:19:44 -0400 >> Subject: RE: How to efficiently join HBase tables? >> >> Re: " So, you all realize the joins have been talked about in the database >> community for 40 years?" >> >> Great point. What's old is new! :-) >> >> My suggested from earlier in the thread was a variant of nested loops by >> using multi-get in HTable, which would reduce the number of RPC calls. So >> it's a "bulk-select nested loops" of sorts (i.e., as opposed to the 1-by-1 >> lookup of regular nested loops). >> >> >> -----Original Message----- >> From: Buttler, David [mailto:buttl...@llnl.gov] >> Sent: Monday, June 06, 2011 4:30 PM >> To: user@hbase.apache.org >> Subject: RE: How to efficiently join HBase tables? >> >> So, you all realize the joins have been talked about in the database >> community for 40 years? There are two main types of joins: >> Nested loops >> Hash table >> >> Mike, in his various emails seems to be trying to re-imagine how to >> implement both types of joins in HBase (which seems like a reasonable goal). >> I am not exactly sure what Eran is going for here, but it seems like Eran is >> glossing over a piece. If you have two scanners for table A and B, then >> table B needs to be rescanned for every unique part of the join condition in >> table A. There are certain ways of improving the efficiency of that: the >> two most obvious are pushing the selection criteria down to the scans, and >> scanning all of the same join values from table B at the same time (which >> requires that Table B's key is the join, or a secondary structure that >> stores the join values as the primary order). >> >> Dave >> >> -----Original Message----- >> From: e...@gigya-inc.com [mailto:e...@gigya-inc.com] On Behalf Of Eran >> Kutner >> Sent: Friday, June 03, 2011 12:24 AM >> To: user@hbase.apache.org >> Subject: Re: How to efficiently join HBase tables? >> >> Mike, this more or less what I tried to describe in my initial post, only >> you explained it much better. >> The problem is that I want to do all of this in one M/R run, not 3 and >> without explicit temp tables. If there was only a way to feed both table A >> and table B into the M/R job then it could be done. >> >> Let's take your query and assumptions, for example. >> So we configure scanner A to return rows where c=xxx and d=yyy We then >> configure scanner B to return rows where e=zzz Now we feed all those rows to >> the mapper. >> For each row the mapper gets it outputs a new key which is "a|b" and the >> same value it received, if either one doesn't exist in the row the mapper >> doesn't output anything for that row. >> The is an implicit "temp table" created at this stage by hadoop. >> Now the reducer is run, for every key "a|b" generated by the mapper it would >> get one or more value sets, each one representing a row from the original >> two tables. For simplicity lets assume we got two rows, one from table A the >> other from table B. Now the reducer can combine the two rows and output the >> combined row. This will work just the same if there were multiple rows from >> each table with the same "a|b" key, in that case the reducer would have to >> generate the Cartesian product of all the rows. Outer joins can also be done >> this way, in an outer join you only get one row in the reducer for a given >> "a|b" key but still generate an output. >> >> -eran >> >> >> >> On Fri, Jun 3, 2011 at 00:05, Michael Segel <michael_se...@hotmail.com>wrote: >> >>> >>> Not to beat a dead horse, but I thought a bit more about the problem. >>> If you want to do this all in HBase using a M/R job... >>> >>> Lets define the following: >>> SELECT * >>> FROM A, B >>> WHERE A.a = B.a >>> AND A.b = B.b >>> AND A.c = xxx >>> AND A.d = yyy >>> AND B.e = zzz >>> >>> Is the sample query. >>> >>> So our join key is "a|b" because we're matching on columns a and b. >>> (The pipe is to delimit the columns, assuming the columns themselves >>> don't contain pipes...) >>> >>> Our filters on A are c and d while e is the filter on B. >>> >>> So we want to do the following: >>> >>> M/R Map job 1 gets the subset from table A along with a set of unique keys. >>> M/R Map job 2 gets the subset from table B along with a set of unique keys. >>> M/R Map job 3 takes either set of unique keys as the input list and >>> you split it based on the number of parallel mappers you want to use. >>> >>> You have a couple of options on how you want to proceed. >>> In each Mapper.map() your input is a unique key. >>> I guess you could create two scanners, one for tempTableA, and one >>> for tempTableB. >>> It looks like you can get the iterator for each result set, and then >>> for each row in temp table A, you iterate through the result set >>> from temp table B, writing out the joined set. >>> >>> The only problem is that your result set file isn't in sort order. >>> So I guess you could take the output from this job and reduce it to >>> get it in to sort order. >>> >>> Option B. Using HDFS files for temp 'tables'. >>> You can do this... but you would still have to track the unique keys >>> and also sort both the keys and the files which will require a reduce job. >>> >>> >>> Now this is just my opinion, but if I use HBase, I don't have to >>> worry about using a reducer except to order the final output set. >>> So I can save the time it takes to do the reduce step. So I have to ask... >>> how much time is spent by HBase in splitting and compacting the temp tables? >>> Also can't you pre-split the temp table before you use them? >>> >>> Or am I still missing something? >>> >>> Note: In this example, you'd have to write an input format that >>> takes a java list object (or something similar) as your input and >>> then you can split it to get it to run in parallel. >>> Or you could just write this on the client and split the list up and >>> run the join in parallel threads on the client node. Or a single >>> thread which would mean that it would run and output in sort order. >>> >>> HTH >>> >>> -Mike >>> >>>> Date: Wed, 1 Jun 2011 07:47:30 -0700 >>>> Subject: Re: How to efficiently join HBase tables? >>>> From: jason.rutherg...@gmail.com >>>> To: user@hbase.apache.org >>>> >>>>> you somehow need to flush all in-memory data *and* perform a >>>>> major compaction >>>> >>>> This makes sense. Without compaction the linear HDFS scan isn't >>>> possible. I suppose one could compact 'offline' in a different >>>> Map Reduce job. However that would have it's own issues. >>>> >>>>> The files do have a flag if they were made by a major >>>>> compaction, so you scan only those and ignore the newer ones - >>>>> but then you are >>> trailing >>>> >>>> This could be ok in many cases. The key would be to create a >>>> sync'd cut off point enabling a frozen point-in-time 'view' of the data. >>>> I'm not sure how that would be implemented. >>>> >>>> On Wed, Jun 1, 2011 at 6:54 AM, Lars George >>>> <lars.geo...@gmail.com> >>> wrote: >>>>> Hi Jason, >>>>> >>>>> This was discussed in the past, using the HFileInputFormat. The >>>>> issue is that you somehow need to flush all in-memory data *and* >>>>> perform a major compaction - or else you would need all the >>>>> logic of the ColumnTracker in the HFIF. Since that needs to scan >>>>> all storage files in parallel to achieve its job, the MR task >>>>> would not really be able to use the same approach. >>>>> >>>>> Running a major compaction creates a lot of churn, so it is >>>>> questionable what the outcome is. The files do have a flag if >>>>> they were made by a major compaction, so you scan only those and >>>>> ignore the newer ones - but then you are trailing, and you still >>>>> do not handle delete markers/updates in newer files. No easy feat. >>>>> >>>>> Lars >>>>> >>>>> On Wed, Jun 1, 2011 at 2:41 AM, Jason Rutherglen >>>>> <jason.rutherg...@gmail.com> wrote: >>>>>>> I'd imagine that join operations do not require realtime-ness, >>>>>>> and so faster batch jobs using Hive -> frozen HBase files in >>>>>>> HDFS could be the optimal way to go? >>>>>> >>>>>> In addition to lessening the load on the perhaps live RegionServer. >>>>>> There's no Jira for this, I'm tempted to open one. >>>>>> >>>>>> On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen >>>>>> <jason.rutherg...@gmail.com> wrote: >>>>>>>> The Hive-HBase integration allows you to create Hive tables >>>>>>>> that are >>> backed >>>>>>>> by HBase >>>>>>> >>>>>>> In addition, HBase can be made to go faster for MapReduce >>>>>>> jobs, if >>> the >>>>>>> HFile's could be used directly in HDFS, rather than proxying >>>>>>> through the RegionServer. >>>>>>> >>>>>>> I'd imagine that join operations do not require realtime-ness, >>>>>>> and so faster batch jobs using Hive -> frozen HBase files in >>>>>>> HDFS could be the optimal way to go? >>>>>>> >>>>>>> On Tue, May 31, 2011 at 1:41 PM, Patrick Angeles < >>> patr...@cloudera.com> wrote: >>>>>>>> On Tue, May 31, 2011 at 3:19 PM, Eran Kutner <e...@gigya.com> >>> wrote: >>>>>>>> >>>>>>>>> For my need I don't really need the general case, but even >>>>>>>>> if I did >>> I think >>>>>>>>> it can probably be done simpler. >>>>>>>>> The main problem is getting the data from both tables into >>>>>>>>> the same >>> MR job, >>>>>>>>> without resorting to lookups. So without the theoretical >>>>>>>>> MutliTableInputFormat, I could just copy all the data from >>>>>>>>> both >>> tables into >>>>>>>>> a temp table, just append the source table name to the row >>>>>>>>> keys to >>> make >>>>>>>>> sure >>>>>>>>> there are no conflicts. When all the data from both tables >>>>>>>>> is in >>> the same >>>>>>>>> temp table, run a MR job. For each row the mapper should >>>>>>>>> emit a key >>> which >>>>>>>>> is >>>>>>>>> composed of all the values of the join fields in that row >>>>>>>>> (the >>> value can be >>>>>>>>> emitted as is). This will cause all the rows from both >>>>>>>>> tables, with >>> same >>>>>>>>> join field values to arrive at the reducer together. The >>>>>>>>> reducer >>> could then >>>>>>>>> iterate over them and produce the Cartesian product as needed. >>>>>>>>> >>>>>>>>> I still don't like having to copy all the data into a temp >>>>>>>>> table >>> just >>>>>>>>> because I can't feed two tables into the MR job. >>>>>>>>> >>>>>>>> >>>>>>>> Loading the smaller table in memory is called a map join, >>>>>>>> versus a reduce-side join (a.k.a. common join). One reason to >>>>>>>> prefer a map >>> join is >>>>>>>> you avoid the shuffle phase which potentially involves >>>>>>>> several trips >>> to disk >>>>>>>> for the intermediate records due to spills, and also once >>>>>>>> through >>> the >>>>>>>> network to get each intermediate KV pair to the right reducer. >>>>>>>> With >>> a map >>>>>>>> join, everything is local, except for the part where you load >>>>>>>> the >>> small >>>>>>>> table. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> As Jason Rutherglen mentioned above, Hive can do joins. I >>>>>>>>> don't >>> know if it >>>>>>>>> can do them for HBase and it will not suit my needs, but it >>>>>>>>> would >>> be >>>>>>>>> interesting to know how is it doing them, if anyone knows. >>>>>>>>> >>>>>>>> >>>>>>>> The Hive-HBase integration allows you to create Hive tables >>>>>>>> that are >>> backed >>>>>>>> by HBase. You can do joins on those tables (and also with >>>>>>>> standard >>> Hive >>>>>>>> tables). It might be worth trying out in your case as it lets >>>>>>>> you >>> easily see >>>>>>>> the load characteristics and the job runtime without much >>>>>>>> coding >>> investment. >>>>>>>> >>>>>>>> There are probably some specific optimizations that can be >>>>>>>> applied >>> to your >>>>>>>> situation, but it's hard to say without knowing your use-case. >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> - Patrick >>>>>>>> >>>>>>>> >>>>>>>>> -eran >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, May 31, 2011 at 22:02, Ted Dunning >>>>>>>>> <tdunn...@maprtech.com> >>> wrote: >>>>>>>>> >>>>>>>>>> The Cartesian product often makes an honest-to-god join >>>>>>>>>> not such >>> a good >>>>>>>>>> idea >>>>>>>>>> on large data. The common alternative is co-group which >>>>>>>>>> is basically like doing the hard work of the join, but >>> involves >>>>>>>>>> stopping just before emitting the cartesian product. This >>>>>>>>>> allows you to inject whatever cleverness you need at this point. >>>>>>>>>> >>>>>>>>>> Common kinds of cleverness include down-sampling of >>> problematically large >>>>>>>>>> sets of candidates. >>>>>>>>>> >>>>>>>>>> On Tue, May 31, 2011 at 11:56 AM, Michael Segel >>>>>>>>>> <michael_se...@hotmail.com>wrote: >>>>>>>>>> >>>>>>>>>>> So the underlying problem that the OP was trying to >>>>>>>>>>> solve was >>> how to >>>>>>>>> join >>>>>>>>>>> two tables from HBase. >>>>>>>>>>> Unfortunately I goofed. >>>>>>>>>>> I gave a quick and dirty solution that is a bit incomplete. >>> They row >>>>>>>>> key >>>>>>>>>> in >>>>>>>>>>> the temp table has to be unique and I forgot about the >>> Cartesian >>>>>>>>>>> product. So my solution wouldn't work in the general case. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> > >