Re: MapSide Join and left outer or right outer joins?

2008-07-03 Thread Jason Venner
Ahh yes, it is absolutely critical that the partitioning in all of the sets be the same :) We are currently assuming that: 1) Default Partitioner 2) Same Reduce Count 3) Text keys will guarantee that, but as I mentioned above, we are assuming ;) Testing is just starting Chris Douglas wrote: Sor

Re: MapSide Join and left outer or right outer joins?

2008-07-03 Thread Chris Douglas
Sorry, I meant splits (partitions of input data). If you have n datasets and m splits per dataset, m_i must contain the same keys for all n. So if you're joining two datasets A and B sharing a key k, if split i from A contains any instances of k, (a) split i from A must contain all instance

Re: MapSide Join and left outer or right outer joins?

2008-07-03 Thread Jason Venner
We are using the default partitioner. I am just about to start verifying my result as it took quite a while to work my way through the in-obvious issues of hand writing MapFiles, thinks like the key and value class are extracted from the jobconf, output key/value. Question: I looked at the Has

Re: MapSide Join and left outer or right outer joins?

2008-07-03 Thread Chris Douglas
Forgive me if you already know this, but the correctness of the map- side join is very sensitive to partitioning; if your input in sorted but equal keys go to different partitions, your results may be incorrect. Is your input such that the default partitioning is sufficient? Have you verifie

Re: MapSide Join and left outer or right outer joins?

2008-07-02 Thread Jason Venner
For the data joins, I let the framework do it - which means one partition per split - so I have to chose my partition count carefully to fill the machines. I had an error in my initial outer join mapper, the join map code now runs about 40x faster than the old brute force read it all shuffle &

Re: MapSide Join and left outer or right outer joins?

2008-07-02 Thread Chris Douglas
Hi Jason- It only seems like full outer or full inner joins are supported. I was hoping to just do a left outer join. Is this supported or planned? The full inner/outer joins are examples, really. You can define your own operations by extending o.a.h.mapred.join.JoinRecordReader or o.a

MapSide Join and left outer or right outer joins?

2008-06-30 Thread Jason Venner
It only seems like full outer or full inner joins are supported. I was hoping to just do a left outer join. Is this supported or planned? On the flip side doing the Outer Join is about 8x faster than doing a map/reduce over our dataset. Thanks -- Jason Venner Attributor - Program the Web