Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
xhochy commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644097931 ## content/en/docs/File Format/implementationstatus.md: ## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +#

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
xhochy commented on PR #34: URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2175559724 Please mark PRs that are ready for review as such. I have not given this as a review as it showed up as Draft in my notification. I will do this now in the next 2h. -- This is an autom

Re: [DISCUSS] Merge initial Implementation Status PR and incrementally improve it

2024-06-18 Thread Andrew Lamb
Thank you On Mon, Jun 17, 2024 at 11:40 PM Micah Kornfield wrote: > Hi Andrew, > I agree with this sentiment, I asked on the PR if there would be another > pass and then I can merge it. > > Cheers, > Micah > > On Fri, Jun 14, 2024 at 3:20 AM Andrew Lamb > wrote: > > > Hello Parquet Devs, > > >

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Alkis Evlogimenos
I don't see why it shouldn't be supported. FBLA and String are orthogonal features. The first optimizes encoding by not storing lengths and the latter says the binary is valid UTF8. On Tue, Jun 18, 2024 at 8:35 AM Gang Wu wrote: > FYI, both parquet-cpp [1] and parquet-java [2] do not allow FLBA.

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Gang Wu
I have the same feeling and that's why I've asked in the mentioned PR. It seems FLBA is just a special case of BYTE_ARRAY. On Tue, Jun 18, 2024 at 10:16 PM Alkis Evlogimenos wrote: > I don't see why it shouldn't be supported. FBLA and String are orthogonal > features. The first optimizes encodin

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
alkis commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644575190 ## content/en/docs/File Format/implementationstatus.md: ## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +##

Re: [DISCUSS] Merge initial Implementation Status PR and incrementally improve it

2024-06-18 Thread Alkis Evlogimenos
+1. I would suggest you address the comments first? I went through the open ones and most of them make sense to me (and left few additional comments). On Tue, Jun 18, 2024 at 12:42 PM Andrew Lamb wrote: > Thank you > > On Mon, Jun 17, 2024 at 11:40 PM Micah Kornfield > wrote: > > > Hi Andrew,

Re: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Ofir Manor
At least in SQL, char(n) is a fixed-length string, but it means fixed number of characters. Since strings are typically UTF8, it is still a variable number of bytes... So, I don't see how a string column can be stored in FLBA, even if it has a fixed number of characters (unless less common cases

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
jorisvandenbossche commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644631859 ## content/en/docs/File Format/implementationstatus.md: ## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weigh

Re: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Ed Seidl
I agree with Ofir, UTF8 is inherently variable length. I think I phrased the question incorrectly. For the purposes of cleaning up the use of 'binary' in the spec, does the spec as currently written allow for FLBA with UTF8 encoding?  It looks like as far as parquet-java and parquet-cpp are co

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
etseidl commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644658095 ## content/en/docs/File Format/implementationstatus.md: ## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
jorisvandenbossche commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644685908 ## content/en/docs/File Format/implementationstatus.md: ## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weigh

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
jorisvandenbossche commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644689602 ## content/en/docs/File Format/implementationstatus.md: ## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weigh

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
jorisvandenbossche commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644691098 ## content/en/docs/File Format/implementationstatus.md: ## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weigh

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Julien Le Dem
To me there is no fundamental reason to not allow STRING or ENUM on FIXED_LEN_BYTE_ARRAY. I think historically, the type FIXED_LEN_BYTE_ARRAY was added later. Now, the question is more whether someone wants to spend the effort to add support for it. I agree with Micah it doesn't look like a lot of

Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-18 Thread via GitHub
etseidl commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644837641 ## content/en/docs/File Format/implementationstatus.md: ## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +

Re: [PR] DRAFT: PARQUET-2489: Strawman proposal for Parquet-Java releases [parquet-site]

2024-06-18 Thread via GitHub
alamb commented on code in PR #61: URL: https://github.com/apache/parquet-site/pull/61#discussion_r1632029904 ## content/en/docs/Contribution Guidelines/modules.md: ## @@ -8,7 +8,7 @@ description: > The [parquet-format](https://github.com/apache/parquet-format) project conta