Re: CompositeInputFormat - why in mapred but not mapreduce?

2012-01-15 Thread Harsh J
Mike, The mapred.* API has been undeprecated and continues to be the stable API. In 1.0.0, the new API is/was unfinished and lacks a lot of ports from the mapred.lib.* components. This is being addressed by https://issues.apache.org/jira/browse/MAPREDUCE-3607 if you are interested in backporting a

Re: Starting Map Reduce Job on EC2

2012-01-15 Thread Harsh J
Best to use a secured Hadoop cluster [0], and/or setup appropriate firewall rules that block traffic from other than your trusted IPs. [0] - https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide On Mon, Jan 16, 2012 at 4:33 AM, Something Something wrote: > Good point.  Those ports may not

Re: Starting Map Reduce Job on EC2

2012-01-15 Thread Something Something
All monitoring browser ports.. such as On Sun, Jan 15, 2012 at 5:00 PM, Lance Norskog wrote: > Can you open all of the monitoring browser ports? > > On Sun, Jan 15, 2012 at 3:03 PM, Something Something > wrote: > > Good point. Those ports may not be open. So next question - is it safe > t

Re: Starting Map Reduce Job on EC2

2012-01-15 Thread Lance Norskog
Can you open all of the monitoring browser ports? On Sun, Jan 15, 2012 at 3:03 PM, Something Something wrote: > Good point.  Those ports may not be open.  So next question - is it safe to > open these ports?  How do we securely open these ports to avoid malicious > attacks under EC2? > > (Sorry,

Re: Starting Map Reduce Job on EC2

2012-01-15 Thread Something Something
Good point. Those ports may not be open. So next question - is it safe to open these ports? How do we securely open these ports to avoid malicious attacks under EC2? (Sorry, I know some of these questions are dumb - but we are a startup and don't have a big sysadmin group - I guess that's why w

Re: What is the right way to do map-side joins in Hadoop 1.0?

2012-01-15 Thread Mike Spreitzer
Yes, I did look at CompositeInputFormat. That is why I remarked that I suppose that I should be looking under org.apache.hadoop.mapreduce.* and sent the earlier question about why CompositeInputFormat is not under org.apache.hadoop.mapreduce.* in Hadoop 1.0.0. But I have gotten no answers yet

Re: Starting Map Reduce Job on EC2

2012-01-15 Thread Ronald Petty
Something Something, Have you confirmed you can connect to the port from your remote machine? telnet ec2-xx 9000 Kindest regards. Ron On Sun, Jan 15, 2012 at 12:16 AM, Something Something < mailinglist...@gmail.com> wrote: > Hello, > > Our Hadoop cluster is setup on EC2, but our clien

Re: What is the right way to do map-side joins in Hadoop 1.0?

2012-01-15 Thread Bejoy Ks
Hi Mark Have a look at CompositeInputFormat. I guess it is what you are looking for to achieve map side joins. If you are fine with a Reduce side join go in with MultipleInputFormat. I have tried the same sort of joins using MultipleInputFormat and have scribbled something on the same.

Re: What is the right way to do map-side joins in Hadoop 1.0?

2012-01-15 Thread Mike Spreitzer
BTW, each key appears exactly once in the large constant dataset, and exactly once in each MR job's output. I am thinking the right approach is to consistently partition the job output and the large constant dataset, with the number of partitions being the number of reduce tasks; each part goes

What is the right way to do map-side joins in Hadoop 1.0?

2012-01-15 Thread Mike Spreitzer
I have a problem that needs to be solved by an iteration of MapReduce jobs, and in each iteration I need to start by doing an equijoin between a large constant dataset and the output of the previous iteration; the remainder of my map function works on a joined tuple in a way whose details are n