You can write your own InputFormat (IF) which extends FileInputFormat.
In your IF you get the InputSplit which has the filename during the call to
getRecordReader. That is the hook you are looking for.
More details here:
http://hadoop.apache.org/common/docs/r1.0.2/mapred_tutorial.html#Job+Input
It is possible but a little tricky.
As I mention before,write a custom InputFormat and the associate
RecordReader.
On Wed, Apr 11, 2012 at 5:23 PM, Grzegorz Gunia
wrote:
> I think we misunderstood here.
>
> I'll base my question upon an example:
> Lets say I want each of the files stored on my
I think we misunderstood here.
I'll base my question upon an example:
Lets say I want each of the files stored on my hdfs to be encrypted
prior to being physically stored on the cluster.
For that I'll write a custom CompressionCodec, that performs the
encryption, and use it during any edits/cre
If your are:
1. using TextInputFormat.
2.all input files are ends with certain suffix like ".gz"
3.the custom CompressionCodec already register in configuration and
getDefaultExtension return the same suffix like as describe in 2.
the nothing else you need to do.
hadoop will deal with it automati
...@student.agh.edu.pl]
Sent: Wednesday, April 11, 2012 1:46 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: CompressionCodec in MapReduce
Thanks for you reply! That clears some thing up
There is but one problem... My CompressionCodec has to be instantiated on a
per-file basis, meaning it needs
Thanks for you reply! That clears some thing up
There is but one problem... My CompressionCodec has to be instantiated
on a per-file basis, meaning it needs to know the name of the file it is
to compress/decompress. I'm guessing that would not be possible with the
current implementation?
Or i
append your custom codec full class name in "io.compression.codecs" either
in mapred-site.xml or in the configuration object pass to Job constructor.
the map reduce framework will try to guess the compress algorithm using the
input files suffix.
if any CompressionCodec.getDefaultExtension() regis
Hello,
I am trying to apply a custom CompressionCodec to work with MapReduce
jobs, but I haven't found a way to inject it during the reading of input
data, or during the write of the job results.
Am I missing something, or is there no support for compressed files in
the filesystem?
I am well