Dan,
Can you share your error? The plain .gz files (not .tar.gz) are natively
supported by Hadoop via its GzipCodec, and if you are facing an error, I
believe its cause of something other than compression.
On Fri, Jul 20, 2012 at 6:14 AM, Dan Yi wrote:
> i have a MR job to read file on amazon S
i have a MR job to read file on amazon S3 and process the data on local hdfs.
the files are zipped text file as .gz. i tried to setup the job as below but it
won't work, anyone know what might be wrong? do i need to add extra step to
unzip the file first? thanks.
String S3_LOCATION = "s3n://ac
I have a slightly modified Text Output Format that essentially writes each key
into its own file. It operates off the premise that my reducer is an identity
function and it emits each record one-by-one in the order they come from the
collection. Because the records are emitted in order from the
You need to ask your job to not discard failed task files. Else they
get cleared away (except for logs) and thats why you do not see it
anymore afterwards.
If you're using 1.x/0.20.x, set "keep.failed.task.files" to true in
your JobConf/Job.getConfiguration objects, before submitting your job.
Aft
Matt,
The reducer's reduce(Key, ) call does proceed in sorted order.
You can safely assume that when the next reduce call begins, you will
no longer get the previous Key again, and can hence close your file.
This is guaranteed by the sorter framework and several tests in MR
land cover this.
On Th
>From what I gather about how Map Reduce operates, there isn't really any
>functional difference between whether a single OutputFormat object is
>initialized on a central node or if each reducer task initializes its own
>OutputFormat object. What I would like to know however, is the relationshi
Thanks Markus,
But as I said, I have only read access on the nodes and I can't make that
change. So the question open.
Marek M.
From: Markus Jelsma [markus.jel...@openindex.io]
Sent: Wednesday, July 18, 2012 9:06 PM
To: mapreduce-user@hadoop.apache.org
S