Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-17 Thread Gang Wu
FYI, both parquet-cpp [1] and parquet-java [2] do not allow FLBA. [1] https://github.com/apache/arrow/blob/eec6f17c8879b469dc3370dad4a7f68f11705a6b/cpp/src/parquet/types.cc#L829-L842 [2] https://github.com/apache/parquet-java/blob/fbe13d89ae4193be12c164d4bb5342c5eba3963f/parquet-column/src/main/ja

Re: [PR] DRAFT: PARQUET-2489: Strawman proposal for Parquet-Java releases [parquet-site]

2024-06-17 Thread via GitHub
emkornfield commented on code in PR #61: URL: https://github.com/apache/parquet-site/pull/61#discussion_r1643875161 ## content/en/docs/Contribution Guidelines/releasing.md: ## @@ -3,7 +3,7 @@ title: "Releasing Parquet" linkTitle: "Releasing Parquet" weight: 4 description: > -

Re: [PR] DRAFT: PARQUET-2489: Strawman proposal for releases [parquet-site]

2024-06-17 Thread via GitHub
jorisvandenbossche commented on code in PR #61: URL: https://github.com/apache/parquet-site/pull/61#discussion_r1643865877 ## content/en/docs/Contribution Guidelines/releasing.md: ## @@ -173,3 +173,18 @@ Update the downloads page on parquet.apache.org. Instructions for updating

Re: [PR] DRAFT: PARQUET-2489: Strawman proposal for releases [parquet-site]

2024-06-17 Thread via GitHub
jorisvandenbossche commented on code in PR #61: URL: https://github.com/apache/parquet-site/pull/61#discussion_r1643864574 ## content/en/docs/Contribution Guidelines/releasing.md: ## @@ -3,7 +3,7 @@ title: "Releasing Parquet" linkTitle: "Releasing Parquet" weight: 4 descripti

[DISCUSS] Guidance on new features for parquet-format and releases for parquet-java

2024-06-17 Thread Micah Kornfield
As part of the recent discussions on new iterations for Parquet, I put together a strawman proposal for how we can try to ensure Parquet implementations still remain widely compatible [1][2]. At a high level there are three parts: 1. A proposal to move to a time based release cadence for parquet-j

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-17 Thread via GitHub
alippai commented on PR #34: URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2174938745 Yes, the input is great and I’ll give it a go tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-17 Thread Micah Kornfield
> > My instinct says "No", but others may have a different interpretation. This is also my instinct, I think we should check validation in Parquet-java and parquet-cpp to see if they are in agreement on the matter and then make a decision from there. It doesn't seem too onerous to support FLBA a

Re: [DISCUSS] Merge initial Implementation Status PR and incrementally improve it

2024-06-17 Thread Micah Kornfield
Hi Andrew, I agree with this sentiment, I asked on the PR if there would be another pass and then I can merge it. Cheers, Micah On Fri, Jun 14, 2024 at 3:20 AM Andrew Lamb wrote: > Hello Parquet Devs, > > I propose we merge the first (admittedly bare bones) "Implementation > Status" page PR [1]

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-17 Thread via GitHub
emkornfield commented on PR #34: URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2174904125 > It is my opinion that we should try and address any outstanding comments that are straightforward, but then merge this PR even if there are other more substantial ones unresolved (

[DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-17 Thread Ed Seidl
Hi all, While discussing PARQUET-2485 a question was raised about the STRING annotation [1]. The current wording in the specification is "|STRING| may only be used to annotate the binary primitive type"; PARQUET-2485 would change that to "|STRING| may only be used to annotate the |BYTE_ARRAY|

Re: [VOTE] Migration of parquet-* issues from Jira to GitHub

2024-06-17 Thread Steve Loughran
=0 I'm going to miss * the ability cross reference stuff from other jira projects * the simplicity of being able to use a string like "PARQUET-123" to refer to an issue * the ease of being able to set up your ide and web browser to go from a reference like this to a jira page * maybe uber-JIRAs I

Re: [VOTE] Migration of parquet-* issues from Jira to GitHub

2024-06-17 Thread Julien Le Dem
+1 (binding) On Mon, Jun 17, 2024 at 2:17 PM Driesprong, Fokko wrote: > +1 (non-binding) > > Kind regards, > Fokko > > Op vr 14 jun 2024 om 12:11 schreef Andrew Lamb : > > > +1 (non binding) > > > > Thanks! > > > > On Fri, Jun 14, 2024 at 2:29 AM Gábor Szádovszky > wrote: > > > > > +1 (binding)

Re: flatbuffer footer stream

2024-06-17 Thread Alkis Evlogimenos
Update the gist is now in Apache 2.0 licence and updated to the latest version I got so far. Tangentially related, I have published a PR for a tool that extracts and scrubs footers from parquet files to aid customers donating footers

Re: [VOTE] Migration of parquet-* issues from Jira to GitHub

2024-06-17 Thread Driesprong, Fokko
+1 (non-binding) Kind regards, Fokko Op vr 14 jun 2024 om 12:11 schreef Andrew Lamb : > +1 (non binding) > > Thanks! > > On Fri, Jun 14, 2024 at 2:29 AM Gábor Szádovszky wrote: > > > +1 (binding) > > > > Gang Wu ezt írta (időpont: 2024. jún. 14., P, 4:00): > > > > > +1 (binding) > > > > > > Be