RE: How to efficiently join HBase tables?

2011-06-16 Thread Buttler, David
[mailto:florinp...@yahoo.com] Sent: Thursday, June 16, 2011 5:44 AM To: user@hbase.apache.org Subject: Re: How to efficiently join HBase tables? Hello! Regarding the same subject of joining, I have the following scenario: 1. I have a big table DOCS that contains the columns UUID DOCID sdsd

Re: How to efficiently join HBase tables?

2011-06-16 Thread Florin P
Hello! Regarding the same subject of joining, I have the following scenario: 1. I have a big table DOCS that contains the columns UUID DOCID sdsd 1 hdhs 3 gdhg 7 shdg 9 and so on (hope you got the idea) 2. an external list of docID (LIST) 3 1 7

Re: How to efficiently join HBase tables?

2011-06-09 Thread Michel Segel
s, one row from A, >>> and one row from B. Are you suggesting that we get the following map >> calls: >>> Key1 & key4 >>> Key2 & key5 >>> Key3 & key6 >>> >>> Or are you suggesting we get the following: >>> Key1 & ke

Re: How to efficiently join HBase tables?

2011-06-09 Thread Michel Segel
cking out > of this thread (ka-ching). > > > -Original Message- > From: im_gu...@hotmail.com [mailto:im_gu...@hotmail.com] On Behalf Of Michel > Segel > Sent: Wednesday, June 08, 2011 10:14 AM > To: user@hbase.apache.org > Subject: Re: How to efficiently join

Re: How to efficiently join HBase tables?

2011-06-09 Thread Eran Kutner
we get the following: > > Key1 & key4 > > Key1 & key5 > > Key1 & key6 > > Key2 & key4 > > Key2 & key5 > > Key2 & key6 > > Key3 & key4 > > Key3 & key5 > > Key3 & key6 > > > > Or are you suggesting some

RE: How to efficiently join HBase tables?

2011-06-08 Thread Doug Meil
2011 10:14 AM To: user@hbase.apache.org Subject: Re: How to efficiently join HBase tables? Unless I am mistaken... get() requires a row key, right? And you can join tables on column data which isn't in the row key, right? So how do you do a get()? :-) Sure there is more than one way to skin a

RE: How to efficiently join HBase tables?

2011-06-08 Thread Buttler, David
ginal Message- From: ddlat...@gmail.com [mailto:ddlat...@gmail.com] On Behalf Of Dave Latham Sent: Wednesday, June 08, 2011 2:36 PM To: user@hbase.apache.org Subject: Re: How to efficiently join HBase tables? I believe this is what Eran is suggesting: Table A --- Row1 (has joinVal_1) Row2

Re: How to efficiently join HBase tables?

2011-06-08 Thread Dave Latham
t; Key3 & key6 > > Or are you suggesting something different? > > Dave > > -----Original Message- > From: e...@gigya-inc.com [mailto:e...@gigya-inc.com] On Behalf Of Eran > Kutner > Sent: Wednesday, June 08, 2011 11:47 AM > To: user@hbase.apache.org > Subje

Re: How to efficiently join HBase tables?

2011-06-08 Thread Michel Segel
otmail.com] > Sent: Monday, June 06, 2011 10:08 PM > To: user@hbase.apache.org > Subject: RE: How to efficiently join HBase tables? > > > Well > > David, is correct. > > Eran wanted to do a join which is a relational concept that isn't natively > supported by a No

RE: How to efficiently join HBase tables?

2011-06-08 Thread Buttler, David
ey6 Or are you suggesting something different? Dave -Original Message- From: e...@gigya-inc.com [mailto:e...@gigya-inc.com] On Behalf Of Eran Kutner Sent: Wednesday, June 08, 2011 11:47 AM To: user@hbase.apache.org Subject: Re: How to efficiently join HBase tables? I'd like to clarify,

Re: How to efficiently join HBase tables?

2011-06-08 Thread Eran Kutner
7;t do a multi-get off the bat" > > That's an assumption, but you're entitled to your opinion. > > -Original Message- > From: Michael Segel [mailto:michael_se...@hotmail.com] > Sent: Monday, June 06, 2011 10:08 PM > To: user@hbase.apache.org > Subject: RE:

RE: How to efficiently join HBase tables?

2011-06-08 Thread Doug Meil
S. If I get a spare moment, I may code this up... > From: doug.m...@explorysmedical.com > To: user@hbase.apache.org > Date: Mon, 6 Jun 2011 17:19:44 -0400 > Subject: RE: How to efficiently join HBase tables? > > Re: " So, you all realize the joins have been talked about i

RE: How to efficiently join HBase tables?

2011-06-06 Thread Michael Segel
g.m...@explorysmedical.com > To: user@hbase.apache.org > Date: Mon, 6 Jun 2011 17:19:44 -0400 > Subject: RE: How to efficiently join HBase tables? > > Re: " So, you all realize the joins have been talked about in the database > community for 40 years?" > > G

RE: How to efficiently join HBase tables?

2011-06-06 Thread Doug Meil
calls. So it's a "bulk-select nested loops" of sorts (i.e., as opposed to the 1-by-1 lookup of regular nested loops). -Original Message- From: Buttler, David [mailto:buttl...@llnl.gov] Sent: Monday, June 06, 2011 4:30 PM To: user@hbase.apache.org Subject: RE: How to effi

RE: How to efficiently join HBase tables?

2011-06-06 Thread Buttler, David
ginal Message- From: e...@gigya-inc.com [mailto:e...@gigya-inc.com] On Behalf Of Eran Kutner Sent: Friday, June 03, 2011 12:24 AM To: user@hbase.apache.org Subject: Re: How to efficiently join HBase tables? Mike, this more or less what I tried to describe in my initial post, only you explain

Re: How to efficiently join HBase tables?

2011-06-03 Thread Eran Kutner
) as your input and then you can split > it to get it to run in parallel. > Or you could just write this on the client and split the list up and run > the join in parallel threads on the client node. Or a single thread which > would mean that it would run and output in sort order. > >

RE: How to efficiently join HBase tables?

2011-06-02 Thread Michael Segel
> Date: Wed, 1 Jun 2011 07:47:30 -0700 > Subject: Re: How to efficiently join HBase tables? > From: jason.rutherg...@gmail.com > To: user@hbase.apache.org > > > you somehow need to flush all in-memory data *and* perform a > > major compaction > > This makes sen

Re: How to efficiently join HBase tables?

2011-06-01 Thread Jason Rutherglen
> you somehow need to flush all in-memory data *and* perform a > major compaction This makes sense. Without compaction the linear HDFS scan isn't possible. I suppose one could compact 'offline' in a different Map Reduce job. However that would have it's own issues. > The files do have a flag i

Re: How to efficiently join HBase tables?

2011-06-01 Thread Lars George
Hi Jason, This was discussed in the past, using the HFileInputFormat. The issue is that you somehow need to flush all in-memory data *and* perform a major compaction - or else you would need all the logic of the ColumnTracker in the HFIF. Since that needs to scan all storage files in parallel to a

Re: How to efficiently join HBase tables?

2011-06-01 Thread Eran Kutner
Thanks everyone for all the helpful insights! -eran On Wed, Jun 1, 2011 at 03:41, Jason Rutherglen wrote: > > I'd imagine that join operations do not require realtime-ness, and so > > faster batch jobs using Hive -> frozen HBase files in HDFS could be > > the optimal way to go? > > In addition

Re: How to efficiently join HBase tables?

2011-05-31 Thread Jason Rutherglen
> I'd imagine that join operations do not require realtime-ness, and so > faster batch jobs using Hive -> frozen HBase files in HDFS could be > the optimal way to go? In addition to lessening the load on the perhaps live RegionServer. There's no Jira for this, I'm tempted to open one. On Tue, May

Re: How to efficiently join HBase tables?

2011-05-31 Thread Bill Graham
We use Pig to join HBase tables using HBaseStorage which has worked well. If you're using HBase >= 0.89 you'll need to build from the trunk or the Pig 0.8 branch. On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > > The Hive-HBase integration allows you to c

Re: How to efficiently join HBase tables?

2011-05-31 Thread Jason Rutherglen
> The Hive-HBase integration allows you to create Hive tables that are backed > by HBase In addition, HBase can be made to go faster for MapReduce jobs, if the HFile's could be used directly in HDFS, rather than proxying through the RegionServer. I'd imagine that join operations do not require re

Re: How to efficiently join HBase tables?

2011-05-31 Thread Patrick Angeles
On Tue, May 31, 2011 at 3:19 PM, Eran Kutner wrote: > For my need I don't really need the general case, but even if I did I think > it can probably be done simpler. > The main problem is getting the data from both tables into the same MR job, > without resorting to lookups. So without the theoret

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
> From: doug.m...@explorysmedical.com > To: user@hbase.apache.org > Date: Tue, 31 May 2011 15:39:14 -0400 > Subject: RE: How to efficiently join HBase tables? > > Re: " Didn't see a multi-get... " > > This is what I'm talking about... > ht

Re: How to efficiently join HBase tables?

2011-05-31 Thread Ted Dunning
Your mapper can tell which file is being read and add source tags to the data records. The reducer can do the cartesian product (if you really need that). On Tue, May 31, 2011 at 12:19 PM, Eran Kutner wrote: > For my need I don't really need the general case, but even if I did I think > it can

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
oing a limited scan of the second table. Its pretty generic. HTH -Mike > From: e...@gigya.com > Date: Tue, 31 May 2011 21:42:58 +0300 > Subject: Re: How to efficiently join HBase tables? > To: user@hbase.apache.org > > Thanks everyone for the great feedback. I'll try to ad

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
?" -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Tuesday, May 31, 2011 2:56 PM To: user@hbase.apache.org Subject: RE: How to efficiently join HBase tables? Doug, I read the OP's post as the following: "> Hi, > I need to join two

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
For my need I don't really need the general case, but even if I did I think it can probably be done simpler. The main problem is getting the data from both tables into the same MR job, without resorting to lookups. So without the theoretical MutliTableInputFormat, I could just copy all the data fro

Re: How to efficiently join HBase tables?

2011-05-31 Thread Ted Dunning
The Cartesian product often makes an honest-to-god join not such a good idea on large data. The common alternative is co-group which is basically like doing the hard work of the join, but involves stopping just before emitting the cartesian product. This allows you to inject whatever cleverness y

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
edical.com > To: user@hbase.apache.org > Date: Tue, 31 May 2011 11:42:27 -0400 > Subject: RE: How to efficiently join HBase tables? > > Eran's observation was that a join is solvable in a Mapper via lookups on a > 2nd HBase table, but it might not be that efficient if the l

Re: How to efficiently join HBase tables?

2011-05-31 Thread Jason Rutherglen
Doesn't Hive for HBase enable joins? On Tue, May 31, 2011 at 5:06 AM, Eran Kutner wrote: > Hi, > I need to join two HBase tables. The obvious way is to use a M/R job for > that. The problem is that the few references to that question I found > recommend pulling one table to the mapper and then do

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
...@hotmail.com] > Sent: Tuesday, May 31, 2011 10:56 AM > To: user@hbase.apache.org > Subject: RE: How to efficiently join HBase tables? > > > Maybe I'm missing something... but this isn't a hard problem to solve. > > Eran wants to join two tables. > If we look at an SQL Sta

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
Mapper and then the batch size is filled, then you do the lookups (and then any required emitting, etc.). -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Tuesday, May 31, 2011 10:56 AM To: user@hbase.apache.org Subject: RE: How to efficiently join HB

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
temp table, you will get the end result automatically. Then you can output your hbase temp table and then truncate the table. So what am I missing? -Mike > From: doug.m...@explorysmedical.com > To: user@hbase.apache.org > Date: Tue, 31 May 2011 10:22:34 -0400 > Subject: RE: H

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
31, 2011 8:06 AM To: user@hbase.apache.org Subject: How to efficiently join HBase tables? Hi, I need to join two HBase tables. The obvious way is to use a M/R job for that. The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a looku

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
to create a column whose name is based on ${tablename}+'|'+${column name} so it would be TableA|Tim and TableB|Tim. HTH -Mike > From: e...@gigya.com > Date: Tue, 31 May 2011 15:43:43 +0300 > Subject: Re: How to efficiently join HBase tables? > To: ferdy.gal...@kaloog

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
MutipleInputs would be ideal, but that seems pretty complicated. MultiTableInputFormat seems like a simple change in the getSplits() method of TableInputFormat + support for a collection of table and their matching scanners instead of a single table and scanner, doesn't sound too complicated. Any o

Re: How to efficiently join HBase tables?

2011-05-31 Thread Ferdy Galema
As far as I can tell there is not yet a build-in mechanism you can use for this. You could implement your own InputFormat, something like MultiTableInputFormat. If you need different map functions for the two tables, perhaps something similar to Hadoop's MultipleInputs should do the trick. On

How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
Hi, I need to join two HBase tables. The obvious way is to use a M/R job for that. The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a lookup for the referred row in the second table. This sounds like a very inefficient way to do