[mailto:florinp...@yahoo.com]
Sent: Thursday, June 16, 2011 5:44 AM
To: user@hbase.apache.org
Subject: Re: How to efficiently join HBase tables?
Hello!
Regarding the same subject of joining, I have the following scenario:
1. I have a big table DOCS that contains the columns
UUID DOCID
sdsd
Hello!
Regarding the same subject of joining, I have the following scenario:
1. I have a big table DOCS that contains the columns
UUID DOCID
sdsd 1
hdhs 3
gdhg 7
shdg 9
and so on (hope you got the idea)
2. an external list of docID
(LIST)
3
1
7
s, one row from A,
>>> and one row from B. Are you suggesting that we get the following map
>> calls:
>>> Key1 & key4
>>> Key2 & key5
>>> Key3 & key6
>>>
>>> Or are you suggesting we get the following:
>>> Key1 & ke
cking out
> of this thread (ka-ching).
>
>
> -Original Message-
> From: im_gu...@hotmail.com [mailto:im_gu...@hotmail.com] On Behalf Of Michel
> Segel
> Sent: Wednesday, June 08, 2011 10:14 AM
> To: user@hbase.apache.org
> Subject: Re: How to efficiently join
we get the following:
> > Key1 & key4
> > Key1 & key5
> > Key1 & key6
> > Key2 & key4
> > Key2 & key5
> > Key2 & key6
> > Key3 & key4
> > Key3 & key5
> > Key3 & key6
> >
> > Or are you suggesting some
2011 10:14 AM
To: user@hbase.apache.org
Subject: Re: How to efficiently join HBase tables?
Unless I am mistaken... get() requires a row key, right?
And you can join tables on column data which isn't in the row key, right?
So how do you do a get()? :-)
Sure there is more than one way to skin a
ginal Message-
From: ddlat...@gmail.com [mailto:ddlat...@gmail.com] On Behalf Of Dave Latham
Sent: Wednesday, June 08, 2011 2:36 PM
To: user@hbase.apache.org
Subject: Re: How to efficiently join HBase tables?
I believe this is what Eran is suggesting:
Table A
---
Row1 (has joinVal_1)
Row2
t; Key3 & key6
>
> Or are you suggesting something different?
>
> Dave
>
> -----Original Message-
> From: e...@gigya-inc.com [mailto:e...@gigya-inc.com] On Behalf Of Eran
> Kutner
> Sent: Wednesday, June 08, 2011 11:47 AM
> To: user@hbase.apache.org
> Subje
otmail.com]
> Sent: Monday, June 06, 2011 10:08 PM
> To: user@hbase.apache.org
> Subject: RE: How to efficiently join HBase tables?
>
>
> Well
>
> David, is correct.
>
> Eran wanted to do a join which is a relational concept that isn't natively
> supported by a No
ey6
Or are you suggesting something different?
Dave
-Original Message-
From: e...@gigya-inc.com [mailto:e...@gigya-inc.com] On Behalf Of Eran Kutner
Sent: Wednesday, June 08, 2011 11:47 AM
To: user@hbase.apache.org
Subject: Re: How to efficiently join HBase tables?
I'd like to clarify,
7;t do a multi-get off the bat"
>
> That's an assumption, but you're entitled to your opinion.
>
> -Original Message-
> From: Michael Segel [mailto:michael_se...@hotmail.com]
> Sent: Monday, June 06, 2011 10:08 PM
> To: user@hbase.apache.org
> Subject: RE:
S. If I get a spare moment, I may code this up...
> From: doug.m...@explorysmedical.com
> To: user@hbase.apache.org
> Date: Mon, 6 Jun 2011 17:19:44 -0400
> Subject: RE: How to efficiently join HBase tables?
>
> Re: " So, you all realize the joins have been talked about i
g.m...@explorysmedical.com
> To: user@hbase.apache.org
> Date: Mon, 6 Jun 2011 17:19:44 -0400
> Subject: RE: How to efficiently join HBase tables?
>
> Re: " So, you all realize the joins have been talked about in the database
> community for 40 years?"
>
> G
calls. So it's a
"bulk-select nested loops" of sorts (i.e., as opposed to the 1-by-1 lookup of
regular nested loops).
-Original Message-
From: Buttler, David [mailto:buttl...@llnl.gov]
Sent: Monday, June 06, 2011 4:30 PM
To: user@hbase.apache.org
Subject: RE: How to effi
ginal Message-
From: e...@gigya-inc.com [mailto:e...@gigya-inc.com] On Behalf Of Eran Kutner
Sent: Friday, June 03, 2011 12:24 AM
To: user@hbase.apache.org
Subject: Re: How to efficiently join HBase tables?
Mike, this more or less what I tried to describe in my initial post, only
you explain
) as your input and then you can split
> it to get it to run in parallel.
> Or you could just write this on the client and split the list up and run
> the join in parallel threads on the client node. Or a single thread which
> would mean that it would run and output in sort order.
>
>
> Date: Wed, 1 Jun 2011 07:47:30 -0700
> Subject: Re: How to efficiently join HBase tables?
> From: jason.rutherg...@gmail.com
> To: user@hbase.apache.org
>
> > you somehow need to flush all in-memory data *and* perform a
> > major compaction
>
> This makes sen
> you somehow need to flush all in-memory data *and* perform a
> major compaction
This makes sense. Without compaction the linear HDFS scan isn't
possible. I suppose one could compact 'offline' in a different Map
Reduce job. However that would have it's own issues.
> The files do have a flag i
Hi Jason,
This was discussed in the past, using the HFileInputFormat. The issue
is that you somehow need to flush all in-memory data *and* perform a
major compaction - or else you would need all the logic of the
ColumnTracker in the HFIF. Since that needs to scan all storage files
in parallel to a
Thanks everyone for all the helpful insights!
-eran
On Wed, Jun 1, 2011 at 03:41, Jason Rutherglen
wrote:
> > I'd imagine that join operations do not require realtime-ness, and so
> > faster batch jobs using Hive -> frozen HBase files in HDFS could be
> > the optimal way to go?
>
> In addition
> I'd imagine that join operations do not require realtime-ness, and so
> faster batch jobs using Hive -> frozen HBase files in HDFS could be
> the optimal way to go?
In addition to lessening the load on the perhaps live RegionServer.
There's no Jira for this, I'm tempted to open one.
On Tue, May
We use Pig to join HBase tables using HBaseStorage which has worked well. If
you're using HBase >= 0.89 you'll need to build from the trunk or the Pig
0.8 branch.
On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> > The Hive-HBase integration allows you to c
> The Hive-HBase integration allows you to create Hive tables that are backed
> by HBase
In addition, HBase can be made to go faster for MapReduce jobs, if the
HFile's could be used directly in HDFS, rather than proxying through
the RegionServer.
I'd imagine that join operations do not require re
On Tue, May 31, 2011 at 3:19 PM, Eran Kutner wrote:
> For my need I don't really need the general case, but even if I did I think
> it can probably be done simpler.
> The main problem is getting the data from both tables into the same MR job,
> without resorting to lookups. So without the theoret
> From: doug.m...@explorysmedical.com
> To: user@hbase.apache.org
> Date: Tue, 31 May 2011 15:39:14 -0400
> Subject: RE: How to efficiently join HBase tables?
>
> Re: " Didn't see a multi-get... "
>
> This is what I'm talking about...
> ht
Your mapper can tell which file is being read and add source tags to the
data records.
The reducer can do the cartesian product (if you really need that).
On Tue, May 31, 2011 at 12:19 PM, Eran Kutner wrote:
> For my need I don't really need the general case, but even if I did I think
> it can
oing a limited
scan of the second table.
Its pretty generic.
HTH
-Mike
> From: e...@gigya.com
> Date: Tue, 31 May 2011 21:42:58 +0300
> Subject: Re: How to efficiently join HBase tables?
> To: user@hbase.apache.org
>
> Thanks everyone for the great feedback. I'll try to ad
?"
-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Tuesday, May 31, 2011 2:56 PM
To: user@hbase.apache.org
Subject: RE: How to efficiently join HBase tables?
Doug,
I read the OP's post as the following:
"> Hi,
> I need to join two
For my need I don't really need the general case, but even if I did I think
it can probably be done simpler.
The main problem is getting the data from both tables into the same MR job,
without resorting to lookups. So without the theoretical
MutliTableInputFormat, I could just copy all the data fro
The Cartesian product often makes an honest-to-god join not such a good idea
on large data. The common alternative is co-group
which is basically like doing the hard work of the join, but involves
stopping just before emitting the cartesian product. This allows
you to inject whatever cleverness y
edical.com
> To: user@hbase.apache.org
> Date: Tue, 31 May 2011 11:42:27 -0400
> Subject: RE: How to efficiently join HBase tables?
>
> Eran's observation was that a join is solvable in a Mapper via lookups on a
> 2nd HBase table, but it might not be that efficient if the l
Doesn't Hive for HBase enable joins?
On Tue, May 31, 2011 at 5:06 AM, Eran Kutner wrote:
> Hi,
> I need to join two HBase tables. The obvious way is to use a M/R job for
> that. The problem is that the few references to that question I found
> recommend pulling one table to the mapper and then do
...@hotmail.com]
> Sent: Tuesday, May 31, 2011 10:56 AM
> To: user@hbase.apache.org
> Subject: RE: How to efficiently join HBase tables?
>
>
> Maybe I'm missing something... but this isn't a hard problem to solve.
>
> Eran wants to join two tables.
> If we look at an SQL Sta
Mapper and then the batch size is filled, then you do
the lookups (and then any required emitting, etc.).
-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Tuesday, May 31, 2011 10:56 AM
To: user@hbase.apache.org
Subject: RE: How to efficiently join HB
temp table, you
will get the end result automatically.
Then you can output your hbase temp table and then truncate the table.
So what am I missing?
-Mike
> From: doug.m...@explorysmedical.com
> To: user@hbase.apache.org
> Date: Tue, 31 May 2011 10:22:34 -0400
> Subject: RE: H
31, 2011 8:06 AM
To: user@hbase.apache.org
Subject: How to efficiently join HBase tables?
Hi,
I need to join two HBase tables. The obvious way is to use a M/R job for that.
The problem is that the few references to that question I found recommend
pulling one table to the mapper and then do a looku
to create a
column whose name is based on ${tablename}+'|'+${column name} so it would be
TableA|Tim and TableB|Tim.
HTH
-Mike
> From: e...@gigya.com
> Date: Tue, 31 May 2011 15:43:43 +0300
> Subject: Re: How to efficiently join HBase tables?
> To: ferdy.gal...@kaloog
MutipleInputs would be ideal, but that seems pretty complicated.
MultiTableInputFormat seems like a simple change in the getSplits() method
of TableInputFormat + support for a collection of table and their matching
scanners instead of a single table and scanner, doesn't sound too
complicated.
Any o
As far as I can tell there is not yet a build-in mechanism you can use
for this. You could implement your own InputFormat, something like
MultiTableInputFormat. If you need different map functions for the two
tables, perhaps something similar to Hadoop's MultipleInputs should do
the trick.
On
Hi,
I need to join two HBase tables. The obvious way is to use a M/R job for
that. The problem is that the few references to that question I found
recommend pulling one table to the mapper and then do a lookup for the
referred row in the second table.
This sounds like a very inefficient way to do
40 matches
Mail list logo