Hi hackers,
In several conversations I had recently with colleagues it was pointed
out that it would be great if PostgreSQL supported COPY to/from
Parquet and other formats. I've found a corresponding discussion [1]
on pgsql-general@. The consensus reached back in 2018 seems to be that
this should
On Mon, Jun 20, 2022 at 8:35 PM Aleksander Alekseev
wrote:
>
> I would like to invest some time into providing a corresponding patch
> for the core and implementing "pg_copy_parquet" extension as a
> practical example, and yet another, a bit simpler, extension as an API
> usage example for the cor
Hi Ashutosh,
> An extension just for COPY to/from parquet looks limited in
> functionality. Shouldn't this be viewed as an FDW or Table AM support
> for parquet or other formats? Of course the later is much larger in
> scope compared to the first one. But there may already be efforts
> underway
>
On Tue, Jun 21, 2022 at 3:26 PM Aleksander Alekseev
wrote:
>
> In other words, personally I'm unaware of use cases when somebody
> needs a complete read/write FDW or TableAM implementation for formats
> like Parquet, ORC, etc. Also to my knowledge they are not particularly
> optimized for this.
>
Hi Ashutosh,
> IIUC, you want extensibility in FORMAT argument to COPY command
> https://www.postgresql.org/docs/current/sql-copy.html. Where the
> format is pluggable. That seems useful.
> Another option is to dump the data in csv format but use external
> utility to convert csv to parquet or wha
Hi,
On 2022-06-22 16:59:16 +0530, Ashutosh Bapat wrote:
> On Tue, Jun 21, 2022 at 3:26 PM Aleksander Alekseev
> wrote:
>
> >
> > In other words, personally I'm unaware of use cases when somebody
> > needs a complete read/write FDW or TableAM implementation for formats
> > like Parquet, ORC, etc.
Andres Freund writes:
> On 2022-06-22 16:59:16 +0530, Ashutosh Bapat wrote:
>> IIUC, you want extensibility in FORMAT argument to COPY command
>> https://www.postgresql.org/docs/current/sql-copy.html. Where the
>> format is pluggable. That seems useful.
> Agreed.
Ditto.
> I suspect that we'd fi
Andres, Tom,
> > I suspect that we'd first need a patch to refactor the existing copy code a
> > good bit to clean things up. After that it hopefully will be possible to
> > plug
> > in a new format without being too intrusive.
>
> I think that step 1 ought to be to convert the existing formats i
Hi,
On 2022-06-23 11:38:29 +0300, Aleksander Alekseev wrote:
> > I know little about parquet - can it support FROM STDIN efficiently?
>
> Parquet is a compressed binary format with data grouped by columns
> [1]. I wouldn't assume that this is a primary use case for this
> particular format.
IMO
On 2022-06-23 Th 21:45, Andres Freund wrote:
> Hi,
>
> On 2022-06-23 11:38:29 +0300, Aleksander Alekseev wrote:
>>> I know little about parquet - can it support FROM STDIN efficiently?
>> Parquet is a compressed binary format with data grouped by columns
>> [1]. I wouldn't assume that this is a p
Hi Andrew,
> > IMO decent COPY FROM / TO STDIN support is crucial, because otherwise you
> > can't do COPY from/to a client. Which would make the feature unusable for
> > anybody not superuser, including just about all users of hosted PG.
> >
>
> +1
>
> Note that Parquet puts the metadata at the e
11 matches
Mail list logo