textFile used to work with .gz files, i haven't tested it on bz2 files. If it isn't decompressing by default then what you have to do is to use the sc.wholeTextFiles and then decompress each record (that being file) with the corresponding codec.
Thanks Best Regards On Tue, Sep 8, 2015 at 6:49 PM, Chris Teoh <chris.t...@gmail.com> wrote: > Hi Folks, > > I tried using Spark v1.2 on bz2 files in Java but the behaviour is > different to the same textFile API call in Python and Scala. > > That being said, how do I process to read .tar.bz2 files in Spark's Java > API? > > Thanks in advance > Chris >