You can use MultipleOutputs and construct the custom file name based on
timestamp.

http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html


On Fri, Feb 28, 2014 at 11:44 PM, Fengyun RAO <raofeng...@gmail.com> wrote:

> It's a common web log analysis situation. The original weblog is saved
> every hour on multiple servers.
> Now we would like the parsed log results to be saved one file an hour. How
> to make it?
>
> In our MR job, the input is a directory with many files in many hours,
> let's say 4X files in X hours.
> if there are e.g. 10 Reducers, then all of the results would be
> partitioned into 10 files, each of which contains results in every hour.
> We would like the results to be save in X files, each of which contains
> only one-hour result.
> Since the input files could change, I can't even set the reducer number to
> be exactly X in the program.
>

Reply via email to