Re: Tracking Job completion times

2012-03-05 Thread Bharath Ravi
Thanks all! Trying out the history API, I ran into a spot of trouble: I'd like to log history to a custom location (It is by default the job output dir in DFS). However, setting hadoop.job.history.location (I tried adding it to the core- and mapred-site xml's) doesn't seem to help. The path it poi

Re: map-reduce on a none closed files

2012-03-05 Thread Niv Mizrahi
hi harsh, yes thank you, we are using sync() API, and still unable to read unclosed files in mapreduce. we are able to cat non-closed files, was it possible if we haven't use sync API() call? have anybody tried ruing a M/R on a non-closed files ? are we missing something ? 10x Niv On Mon, Mar

Re: Tracking Job completion times

2012-03-05 Thread Charles Earl
In terms of accessing metrics(2) programmatically, do people generally extend FileSink for collecting data from small (1 - 15 node) installations? As opposed to using Chukwa, etc. C On Mar 4, 2012, at 7:08 PM, George Datskos wrote: > Bharath, > > Try the hadoop job -history API > > > > On 20

Re: Query regarding Hadoop version 0.20.203

2012-03-05 Thread Harsh J
What Joey said. What you'll want is: FileStatus[] fileStatuses = fs.listStatus(somePath); for (FileStatus fstat : fileStatuses) { Path file = fstat.getPath(); // Do other read/etc. logic here with Path and FileSystem as you want. } Also read the FileStatus API at http://hadoop.apache.org/com

Re: Query regarding Hadoop version 0.20.203

2012-03-05 Thread Joey Echeverria
You don't need to call readFields(), the FileStatus objects are already initialized. You should just be able to call the various getters to get the fields that you're interested in. -Joey On Mon, Mar 5, 2012 at 9:03 AM, Piyush Kansal wrote: > Harsh, > > When I trying to readFields as follows: >

Re: Query regarding Hadoop version 0.20.203

2012-03-05 Thread Piyush Kansal
Harsh, When I trying to readFields as follows: FileStatus origFStatus[] = ipFs.listStatus( ip ); DataInput dataIp; origFStatus[ 0 ].readFields( dataIp ); I am getting a compilation error "variable dataIp might not have been initialized". How do we initialize it? Is there a direct method by whic

Re: map-reduce on a none closed files

2012-03-05 Thread Harsh J
Niv, Did you also try the sync() approach I mentioned? Did that not work? CDH3u2 does have the sync() API in it, so you can use it right away. On Sun, Mar 4, 2012 at 11:26 PM, Niv Mizrahi wrote: > hi harsh, > > thank you for the quick response. > we are currently running with cdh3u2. > > i have

Re: Query regarding Hadoop version 0.20.203

2012-03-05 Thread Piyush Kansal
Thanks Harsh. It worked. On Mon, Mar 5, 2012 at 5:58 AM, Harsh J wrote: > Piyush, > > On Mon, Mar 5, 2012 at 3:16 PM, Piyush Kansal > wrote: > > Ques 1: > > == > > I have a HDFS directory which contains the o/p files of reducer. I want > to > > read all the part-r-* files present in this di

Re: Query regarding Hadoop version 0.20.203

2012-03-05 Thread Harsh J
Piyush, On Mon, Mar 5, 2012 at 3:16 PM, Piyush Kansal wrote: > Ques 1: > == > I have a HDFS directory which contains the o/p files of reducer. I want to > read all the part-r-* files present in this directory. > > I have already tried following options as follows but no luck: > - FileSystem.l

Re: how can i know the number of tasks of map,reduce ,shuffle?

2012-03-05 Thread Piyush Kansal
Hi, You can use two options: - From operational point of view, using JobTracker link (you can open it in a browser). Here you can see the current running progress which shows you the number of tasks and related counters as well - From coding point of view, you can use Job.getNumReduceTasks() (name

Query regarding Hadoop version 0.20.203

2012-03-05 Thread Piyush Kansal
Hi, I am quite new to Hadoop and Java as well and have two questions: *Ques 1:* == I have a HDFS directory which contains the o/p files of reducer. I want to read all the part-r-* files present in this directory. I have already tried following options as follows but no luck: - FileSystem.lis