Thanks. What I noticed was the decompress works if the file is in HDFS but not when it is a local file when working in a development environment.
Does anyone else have the same problem? On Wed, 9 Sep 2015 at 4:40 pm Akhil Das <ak...@sigmoidanalytics.com> wrote: > textFile used to work with .gz files, i haven't tested it on bz2 files. If > it isn't decompressing by default then what you have to do is to use the > sc.wholeTextFiles and then decompress each record (that being file) with > the corresponding codec. > > Thanks > Best Regards > > On Tue, Sep 8, 2015 at 6:49 PM, Chris Teoh <chris.t...@gmail.com> wrote: > >> Hi Folks, >> >> I tried using Spark v1.2 on bz2 files in Java but the behaviour is >> different to the same textFile API call in Python and Scala. >> >> That being said, how do I process to read .tar.bz2 files in Spark's Java >> API? >> >> Thanks in advance >> Chris >> > >