Re: Setting the number of mappers to 0

2010-07-09 Thread Eric Sammer
ice that any use, disclosure, copying or distribution of this > message, in any form, is strictly prohibited. If you have received this > message in error, please immediately notify the sender and/or Syncsort and > destroy all copies of this message in your possession, custody or control. -- Eric Sammer twitter: esammer data: www.cloudera.com

Re: How to get context in Close() method in hadoop Pipes

2010-06-26 Thread Eric Sammer
10 at 9:25 PM, Mohamed Riadh Trad wrote: > Dear All; > > I had to emit final Key/Values in the Mapper Close Method but I can't get the > context. > > Any suggestion? > > Regard. -- Eric Sammer twitter: esammer data: www.cloudera.com

Re: Need help with exception when mapper emits different key class from reducer

2010-06-18 Thread Eric Sammer
       FileInputFormat.addInputPath(job, new Path(otherArgs[0])); >         } >         String athString = otherArgs[otherArgs.length - 1]; >         File out = new File(athString); >         if (out.exists()) { >             FileUtilities.expungeDirectory(out); >             out.delete(); >         } >         Path outputDir = new Path(athString); > >         FileOutputFormat.setOutputPath(job, outputDir); > >         boolean ans = job.waitForCompletion(true); >         int ret = ans ? 0 : 1; >         System.exit(ret); >     } > } > -- > Steven M. Lewis PhD > Institute for Systems Biology > Seattle WA > -- Eric Sammer twitter: esammer data: www.cloudera.com

Re: number of reducers

2010-06-06 Thread Eric Sammer
     conf.set("mapred.reduce.tasks.speculative.execution", "false"); > > What am I missing here? > > cheers > -- > Torsten > -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: how to set max map tasks individually for each job?

2010-06-04 Thread Eric Sammer
x tasks, but that's cluster wide, not per host so I don't think that will be helpful. A better option is to pack more work into each task in the "lighter" of your two jobs so they have similar performance characteristics, if possible. Of course, easier said than done, I know. -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: Running Mapreduce program apart from command prompt

2010-05-27 Thread Eric Sammer
o run the mapreduce program from another > java program. I need some mechanism for submitting the job not from the > command line but some other java program should launch the job. > > Nishant Sonar > -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: How to debug reducer thread?

2010-04-27 Thread Eric Sammer
.hadoop.mapred.ReduceTask.run(ReduceTask.java:395) > at org.apache.hadoop.mapred.Child.main(Child.java:194) > > I would like to debug this thread in a IDE but I don't know how to do it. > Should I define properties to do this? Is there a way to do it? > > Thanks > > -- > PSC >

Re: Hadoop over the internet

2010-04-20 Thread Eric Sammer
nnection, the failure semantics are very different. Without making Hadoop aware of the multi-datacenter case, a failure of a router could easily lose all replicas of a large number of blocks creating a huge hole in the data. Again, it's about more than just performance here. -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: Hadoop over the internet

2010-04-17 Thread Eric Sammer
27;t mean in private computers, all of them in different > places, rather a collection of datacenters, connected to each other over > the Internet. > > Would that fail? If yes, how and why? What issues would arise? > -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: Partitioning Reducer Output

2010-04-05 Thread Eric Sammer
t back to the "old" APIs and use MTOF or MO as you've mentioned. I believe CDH3 has (or will have) updated versions of MTOF and MO for the new APIs but don't quote me on that. -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: MapRed ports

2010-02-09 Thread Eric Sammer
s into the specific class names, etc.). Hope this help. If I've said anything wrong, I'm very happy to have people correct me. Regards. -- Eric Sammer e...@lifeless.net http://esammer.blogspot.com

Task tracker reported machine name / IP

2010-01-20 Thread Eric Sammer
;s configuration? If not, does anyone else feel like there should be? I completely understand the correct answer is to fix the hosts file or not depend on it at all, deferring to DNS. But, it does seem like this bit of the code is overly complicated and brittle. Thoughts? Thanks. -- Eric Sammer e.

Re: Should mapreduce.ReduceContext reuse same object in nextKeyValue?

2010-01-13 Thread Eric Sammer
ably correct myself and say that it depends on the application. In general, the assumption made by the framework is that all reduce values for a given key may not fit in memory. In specific implementations it may be fine (or even necessary) for the user to do buffering like this. Thanks and sorry

Re: Should mapreduce.ReduceContext reuse same object in nextKeyValue?

2010-01-12 Thread Eric Sammer
impact performance and add the requirement that all values for a given key fit in memory. Hope this helps. -- Eric Sammer e...@lifeless.net http://esammer.blogspot.com

Re: How to use an alternative connector to SSH ?

2010-01-12 Thread Eric Sammer
ou can roll your own start up scripts and invoke the underlying hadoop-daemon.sh scripts on each node over whatever communication channel you'd like. You may have to do a little environment setup first if you choose to go this route. Take a look at the source of start-*.sh; they're pre

Re: Questions about JobTracker and TaskTracker

2010-01-11 Thread Eric Sammer
rt-mapred.sh] > "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker [1] - http://wiki.apache.org/hadoop/ [2] - http://www.cloudera.com/hadoop-training-mapreduce-hdfs Hope this helps. -- Eric Sammer e...@lifeless.net http://esammer.blogspot.com

Re: passing dependencies to my Mapper

2009-09-08 Thread Eric Sammer
ency on Spring which for me isn't a problem. You can replace Spring with your DI framework of choice, of course, but this pattern works well for me. Hope this helps! Best regards. -- Eric Sammer e...@lifless.net http://esammer.blogspot.com