Re: Shared files in Hadoop MR cluster

2011-02-08 Thread Harsh J
AFAIK, a Job's JobInfo representation is saved inside a job directory under mapred.system.dir (actually, it is more a JobTracker's system directory than others'), along with a job.jar for job recovery purposes. On Tue, Feb 8, 2011 at 5:34 PM, Pedro Costa wrote: > Hi, > > In hadoop MR exists the p

Re: Best practice for batch file conversions

2011-02-08 Thread Sonal Goyal
You can check out MultipleOutputFormat for this. Thanks and Regards, Sonal Connect Hadoop with databases, Salesforce, FTP servers and others Nube Technologies

Re: Best practice for batch file conversions

2011-02-08 Thread felix gao
I am stuck again. The binary files are stored in hdfs under some pre-defined structure like root/ |-- dir1 | |-- file1 | |-- file2 | `-- file3 |-- dir2 | |-- file1 | `-- file3 `-- dir3 |-- file2 `-- file3 after I processed them somehow using Non-splittable InputFormat in my mappe

Re: location awareness on RT tasks?

2011-02-08 Thread Mahadev Konar
Hi Pedro, You can read abt the hdfs placement policy at: http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html thanks mahadev On Fri, Feb 4, 2011 at 7:06 AM, Pedro Costa wrote: > Hi, > > When hadoop is running in cluster, the output of the Reducers are > saved in HDFS. The MapReduce h

Re: Best practice for batch file conversions

2011-02-08 Thread felix gao
thanks a lot for the pointer. I will play around with it. On Mon, Feb 7, 2011 at 10:55 PM, Sonal Goyal wrote: > Hi, > > You can use FileStreamInputFormat which returns the file stream as the > value. > > > https://github.com/sonalgoyal/hiho/tree/hihoApache0.20/src/co/nubetech/hiho/mapreduce/lib/

What are index files in the hadoop MR

2011-02-08 Thread Pedro Costa
Hi, The map tasks saves as output the map output and a index file. What's the purpose of an index file? Thanks, -- Pedro

Shared files in Hadoop MR cluster

2011-02-08 Thread Pedro Costa
Hi, In hadoop MR exists the property "mapred.system.dir" to set a relative directory where shared files are stored during a job run. What are these shared files? -- Pedro