Re: non-equality joins

Keith Wiley Tue, 13 Mar 2012 10:29:00 -0700

Sounds like Matt possesses the proper combination of expertise in both 
databases and MapReduce to assist you.  I'm bowing out as I honestly don't know 
advanced database concepts at all.  In addition, hive offers hive-specific 
tools like Matt suggested (map-side joins) to help out, which I'm too new too 
to speculate on.  I'm just starting hive this week as a matter of fact.


The short answer on MapReduce algorithms is that the individual computational 
units can't communicate with each other (each mapper or each map() in fact 
cannot communicate with the others, likewise for reducers).  That's one of the 
major distinctions between MapReduce and more general parallel processing 
frameworks like MPI.  This is the wrong mailing list to go much deeper than 
that however.

Thanks Matt.

Best of luck Mahsa.

On Mar 13, 2012, at 10:13 , Tucker, Matt wrote:

> For theta joins, you’ll have to convert the query to an equi-join, and then 
> filter for non-equality in the WHERE clause.  Depending upon the size of each 
> table, you might consider looking at map-side joins, which will allow for 
> doing non-equality filters during a join before it’s passed to the reducers.
>  
> Matt Tucker
>  
> From: mahsa mofidpoor [mailto:mofidp...@gmail.com] 
> Sent: Tuesday, March 13, 2012 1:02 PM
> To: user@hive.apache.org
> Subject: Re: non-equality joins
>  
>  
> Hi Keith,
>  
> Do you know exactly how an algorithm should be in order to fit in the 
> MapReduce framework? Could you refer me to some references?
>  
> Thanks and Regards,
> Mahsa
>  

________________________________________________________________________________
Keith Wiley     kwi...@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda
________________________________________________________________________________

Re: non-equality joins

Reply via email to