You cannot read image files with wholeTextFiles because it uses CombineFileInputFormat which cannot read gripped files because they are not splittable <http://www.bigdataspeak.com/2013_01_01_archive.html> (source proving it):
override def createRecordReader( split: InputSplit, context: TaskAttemptContext): RecordReader[String, String] = { new CombineFileRecordReader[String, String]( split.asInstanceOf[CombineFileSplit], context, classOf[WholeTextFileRecordReader]) } You may be able to use newAPIHadoopFile with wholefileinputformat <https://github.com/tomwhite/hadoop-book/blob/master/ch07/src/main/java/WholeFileInputFormat.java> (not built into hadoop but all over the internet) to get this to work correctly. I don't think WholeFileInputFormat will work since it just gets the bytes of the file, meaning you may have to write your own class possibly extending WholeFileInputFormat. Thanks Best Regards On Thu, Jun 26, 2014 at 3:31 AM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > Is there an equivalent of wholeTextFiles for binary files for example a > set of images ? > > Cheers, > > Jaonary >