Re: CompressionCodec in MapReduce

2012-04-11 Thread Arun C Murthy
You can write your own InputFormat (IF) which extends FileInputFormat. In your IF you get the InputSplit which has the filename during the call to getRecordReader. That is the hook you are looking for. More details here: http://hadoop.apache.org/common/docs/r1.0.2/mapred_tutorial.html#Job+Input

Re: CompressionCodec in MapReduce

2012-04-11 Thread Zizon Qiu
It is possible but a little tricky. As I mention before,write a custom InputFormat and the associate RecordReader. On Wed, Apr 11, 2012 at 5:23 PM, Grzegorz Gunia wrote: > I think we misunderstood here. > > I'll base my question upon an example: > Lets say I want each of the files stored on my

Re: CompressionCodec in MapReduce

2012-04-11 Thread Grzegorz Gunia
I think we misunderstood here. I'll base my question upon an example: Lets say I want each of the files stored on my hdfs to be encrypted prior to being physically stored on the cluster. For that I'll write a custom CompressionCodec, that performs the encryption, and use it during any edits/cre

Re: CompressionCodec in MapReduce

2012-04-11 Thread Zizon Qiu
If your are: 1. using TextInputFormat. 2.all input files are ends with certain suffix like ".gz" 3.the custom CompressionCodec already register in configuration and getDefaultExtension return the same suffix like as describe in 2. the nothing else you need to do. hadoop will deal with it automati

RE: CompressionCodec in MapReduce

2012-04-11 Thread Devaraj k
...@student.agh.edu.pl] Sent: Wednesday, April 11, 2012 1:46 PM To: mapreduce-user@hadoop.apache.org Subject: Re: CompressionCodec in MapReduce Thanks for you reply! That clears some thing up There is but one problem... My CompressionCodec has to be instantiated on a per-file basis, meaning it needs

Re: CompressionCodec in MapReduce

2012-04-11 Thread Grzegorz Gunia
Thanks for you reply! That clears some thing up There is but one problem... My CompressionCodec has to be instantiated on a per-file basis, meaning it needs to know the name of the file it is to compress/decompress. I'm guessing that would not be possible with the current implementation? Or i

Re: CompressionCodec in MapReduce

2012-04-11 Thread Zizon Qiu
append your custom codec full class name in "io.compression.codecs" either in mapred-site.xml or in the configuration object pass to Job constructor. the map reduce framework will try to guess the compress algorithm using the input files suffix. if any CompressionCodec.getDefaultExtension() regis

CompressionCodec in MapReduce

2012-04-11 Thread Grzegorz Gunia
Hello, I am trying to apply a custom CompressionCodec to work with MapReduce jobs, but I haven't found a way to inject it during the reading of input data, or during the write of the job results. Am I missing something, or is there no support for compressed files in the filesystem? I am well