Hi Prem - Maybe I can help clarify to the best of my knowledge. Parquet V2
as a standard isn't finalized just yet. Meaning there is no formal,
*finalized* "contract" that specifies what it means to write data in the V2
version. The discussions/conversations about what the final V2 standard may
be are still in progress and are evolving.

That being said, because V2 code does exist (though unfinalized), there are
clients / tools that are writing data in the un-finalized V2 format, as
seems to be the case with Dremio.

Now, as that comment you quoted said, you can have Spark write V2 files,
but it's worth being mindful about the fact that V2 is a moving target and
can (and likely will) change. You can overwrite parquet.writer.version to
specify your desired version, but it can be dangerous to produce data in a
moving-target format. For example, let's say you write a bunch of data in
Parquet V2, and then the community decides to make a breaking change (which
is completely fine / allowed since V2 isn't finalized). You are now left
having to deal with a potentially large and complicated file format update.
That's why it's not recommended to write files in parquet v2 just yet.



<vinoo.gan...@gmail.com>


On Wed, Apr 17, 2024 at 3:47 PM Prem Sahoo <prem.re...@gmail.com> wrote:

> Hello Team,
> I am working on different products such as Spark and Dremio.
>
> Dremio is able to write and read Parquet V2 and due this upgrade it is
> working faster than Parquet V1 files.
>
> In case of spark it is still defaulting to Parquet V1 and when I
> checked with Spark community they told me Parquet community isn't
> recommending Parquet V2.
>
> "Prem, as I said earlier, v2 is not a finalized spec so you should not use
> it. That's why it is not the default. You can get Spark to write v2 files,
> but it isn't recommended by the Parquet community."
>
> please advise.
>

Reply via email to