Re: How to count rows of output files ?

2011-03-08 Thread JunYoung Kim
actually, a structure of output directories are quite complexed. A directory has 1, 2, 3 as output files B directory has 1, 2, 3, 4 as output files C directory has 1, 2, 3, 5 as output files structure of directories, simply 2011 |- A |- 1 | |- 2 | |- 3

Re: How to count rows of output files ?

2011-03-08 Thread James Seigel
Simplest case, if you need a sum of the lines for A,B, and C is to look at the output that is normally generated which tells you "Reduce output records". This can be accessed like the others are telling you, as a counter, which you could access and explicitly print out or with your eyes as the sum

Re: How to count rows of output files ?

2011-03-08 Thread Harsh J
I think the previous reply wasn't very accurate. So you need a count per-file? One way I can think of doing that, via the job itself, is to use Counter to count the "name of the output + the task's ID". But it would not be a good solution if there are several hundreds of tasks. A distributed count

Re: How to count rows of output files ?

2011-03-08 Thread Harsh J
Count them as you sink using the Counters functionality of Hadoop Map/Reduce (If you're using MultipleOutputs, it has a way to enable counters for each name used). You can then aggregate related counters post-job, if needed. On Tue, Mar 8, 2011 at 3:11 PM, Jun Young Kim wrote: > Hi. > > my hadoop

How to count rows of output files ?

2011-03-08 Thread Jun Young Kim
Hi. my hadoop application generated several output files by a single job. (for example, A, B, C are generated as a result) after finishing a job, I want to count each files' row counts. is there any way to count each files? thanks. -- Junyoung Kim (juneng...@gmail.com)