It's a common web log analysis situation. The original weblog is saved
every hour on multiple servers.
Now we would like the parsed log results to be saved one file an hour. How
to make it?

In our MR job, the input is a directory with many files in many hours,
let's say 4X files in X hours.
if there are e.g. 10 Reducers, then all of the results would be partitioned
into 10 files, each of which contains results in every hour.
We would like the results to be save in X files, each of which contains
only one-hour result.
Since the input files could change, I can't even set the reducer number to
be exactly X in the program.

Reply via email to