Re: Chaining Map Jobs

2011-07-29 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@. Use JobControl: http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Control Arun On Jul 29, 2011, at 4:24 PM, Roger Chen wrote: > Has anyone had experience with chaining map jobs in Hadoop framework 0.20.2? > Thanks. > > -- > Rog

Re: sample usage of custom counters with new map Reduce API

2011-07-29 Thread Mapred Learn
Hi Shrijeet, Is there a way to do it in main class instead of mappers and reducers ? On Fri, Jun 24, 2011 at 2:27 PM, Shrijeet Paliwal wrote: > public class MyReducer >extends Reducer { > > private enum MyCounters { >INPUT_UNIQUE_USERS > } > > @Override > protected void setup(Con

Re: Task JVM Reuse for MapReduce Jobs in 0.20.2

2011-07-29 Thread Harsh J
Brandon, New JVMs for each slot will be spawned across different jobs. For tasks of the same job, this shouldn't happen. Are you seeing this happen for tasks of the same job itself? Also, since your question may be specific to CDH use, I've moved the discussion to cdh-u...@cloudera.org (mapreduce

Re: Can you access Distributed cache in custom output format ?

2011-07-29 Thread Alejandro Abdelnur
So, the -files uses the '#' symlink feature, correct? If so, from the MR task JVM doing a new File() would work, where the FILENAME does not include the path of the file, correct? Thxs. Alejandro On Fri, Jul 29, 2011 at 11:20 AM, Brock Noland wrote: > With -files the file will be placed in th

Task JVM Reuse for MapReduce Jobs in 0.20.2

2011-07-29 Thread Brandon Vargo
Hello, I am trying to setup a MapReduce job so that the task JVMs are reused on each cluster node. Libraries used by my MapReduce job have a significant initialization time, mainly creating singletons, and it would be nice if I could make it so that these singletons are only created once per slot,

Re: Can you access Distributed cache in custom output format ?

2011-07-29 Thread Mapred Learn
when you use -files option, it copies in a .staging directory and all mappers can access it but for output format, I see it is not able to access it. -files copies cache file under: /user//.staging//files/ On Fri, Jul 29, 2011 at 11:14 AM, Alejandro Abdelnur wrote: > Mmmh, I've never used the

Re: Can you access Distributed cache in custom output format ?

2011-07-29 Thread Alejandro Abdelnur
Mmmh, I've never used the -files option (I don't know if it will copy the files to HDFS for your or you have to put them there first). My usage pattern of the DC is copying the files to HDFS, then use the DC API to add those files to the jobconf. Alejandro On Fri, Jul 29, 2011 at 10:56 AM, Mapre

Re: Can you access Distributed cache in custom output format ?

2011-07-29 Thread Mapred Learn
i m trying to access file that I sent as -files option in my hadoop jar command. in my outputformat, I am doing something like: Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); String file1=""; String file2=""; Path pt=null; for (Path p : cacheFiles

Can you access Distributed cache in custom output format ?

2011-07-29 Thread Mapred Learn
Hi, I am trying to access distributed cache in my custom output format but it does not work and file open in custom output format fails with file does not exist even though it physically does. Looks like distributed cache only works for Mappers and Reducers ? Is there a way I can read Distributed

Re:Re: how to move to new version of hadoop?

2011-07-29 Thread 周俊清
Thank you -- 周俊清 2ho...@163.com 在 2011-07-29 10:35:03,"陈加俊" 写道: bin/hadoop namenode -upgrade bin/hadoop datanode -upgrade 2011/7/28 周俊清<2ho...@163.com> hello everyone, Now ,I am running hadoop-0.20.2 , my question is ,how can i move to version of hadoop-0.20