Hi Sid, Snappy itself is not splittable. But the format that contains the actual data like parquet (which are basically divided into row groups) can be compressed using snappy. This works because blocks(pages of parquet format) inside the parquet can be independently compressed using snappy.
Thanks Amit On Wed, Sep 14, 2022 at 8:14 PM Sid <flinkbyhe...@gmail.com> wrote: > Hello experts, > > I know that Gzip and snappy files are not splittable i.e data won't be > distributed into multiple blocks rather it would try to load the data in a > single partition/block > > So, my question is when I write the parquet data via spark it gets stored > at the destination with something like *part*.snappy.parquet* > > So, when I read this data will it affect my performance? > > Please help me if there is any understanding gap. > > Thanks, > Sid >