Re: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-21 Thread Alkis Evlogimenos
> Just my two cents, >Ofir > > > > From: Gang Wu > Sent: Tuesday, June 18, 2024 5:20 PM > To: dev@parquet.apache.org > Subject: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated > with STRING? > > I have the same feeling and that'

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Julien Le Dem
To me there is no fundamental reason to not allow STRING or ENUM on FIXED_LEN_BYTE_ARRAY. I think historically, the type FIXED_LEN_BYTE_ARRAY was added later. Now, the question is more whether someone wants to spend the effort to add support for it. I agree with Micah it doesn't look like a lot of

Re: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Ed Seidl
st my two cents, Ofir From: Gang Wu Sent: Tuesday, June 18, 2024 5:20 PM To: dev@parquet.apache.org Subject: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING? I have the same feeling and that's why I've asked in the mentioned PR. It seems FLBA is just a

Re: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Ofir Manor
ommon cases like an 8-byte encoding like a specific ASCII character set) Just my two cents, Ofir From: Gang Wu Sent: Tuesday, June 18, 2024 5:20 PM To: dev@parquet.apache.org Subject: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING? I

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Gang Wu
I have the same feeling and that's why I've asked in the mentioned PR. It seems FLBA is just a special case of BYTE_ARRAY. On Tue, Jun 18, 2024 at 10:16 PM Alkis Evlogimenos wrote: > I don't see why it shouldn't be supported. FBLA and String are orthogonal > features. The first optimizes encodin

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-18 Thread Alkis Evlogimenos
I don't see why it shouldn't be supported. FBLA and String are orthogonal features. The first optimizes encoding by not storing lengths and the latter says the binary is valid UTF8. On Tue, Jun 18, 2024 at 8:35 AM Gang Wu wrote: > FYI, both parquet-cpp [1] and parquet-java [2] do not allow FLBA.

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-17 Thread Gang Wu
FYI, both parquet-cpp [1] and parquet-java [2] do not allow FLBA. [1] https://github.com/apache/arrow/blob/eec6f17c8879b469dc3370dad4a7f68f11705a6b/cpp/src/parquet/types.cc#L829-L842 [2] https://github.com/apache/parquet-java/blob/fbe13d89ae4193be12c164d4bb5342c5eba3963f/parquet-column/src/main/ja

Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-17 Thread Micah Kornfield
> > My instinct says "No", but others may have a different interpretation. This is also my instinct, I think we should check validation in Parquet-java and parquet-cpp to see if they are in agreement on the matter and then make a decision from there. It doesn't seem too onerous to support FLBA a

[DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with STRING?

2024-06-17 Thread Ed Seidl
Hi all, While discussing PARQUET-2485 a question was raised about the STRING annotation [1]. The current wording in the specification is "|STRING| may only be used to annotate the binary primitive type"; PARQUET-2485 would change that to "|STRING| may only be used to annotate the |BYTE_ARRAY|