Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-06 Thread Kenta Murata
ata to UTF-8. > > > > > > MySQL: > > > > > > CONVERT function > > > > > > https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_convert > > > > > > PostgreSQL: > > > > > > convert_to function > > >

Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-04 Thread Wes McKinney
> > > > PostgreSQL: > > > > convert_to function > > > > https://www.postgresql.org/docs/11/functions-string.html#id-1.5.8.9.7.2.2.8.1.1 > > > > > > > > If we need to support non UTF-8 encodings, I like > > NonUTF8String or some

Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-03 Thread Micah Kornfield
.1 > > > > If we need to support non UTF-8 encodings, I like > NonUTF8String or something extension type and metadata > approach. I prefer "ARROW:encoding" rather than > "ARROW:charset" for metadata key too. > > > Thanks, > -- > kou > > In

Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-03 Thread Sutou Kouhei
d-1.5.8.9.7.2.2.8.1.1 If we need to support non UTF-8 encodings, I like NonUTF8String or something extension type and metadata approach. I prefer "ARROW:encoding" rather than "ARROW:charset" for metadata key too. Thanks, -- kou In "[DISCUSS][FORMAT] Concerning abo

Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-02 Thread Antoine Pitrou
Hello, Le 02/09/2019 à 10:39, Kenta Murata a écrit : > > There are two options to manage a character encoding in a BinaryArray. > The first way is introducing an optional character_encoding field in > BinaryType. The second way is using custom_metadata field to supply > the character encoding

Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-02 Thread Wes McKinney
hi Kenta, It seems like using ExtensionType would be a simple way to handle this for the immediate purpose of implementing user-facing Array types. If we wanted to change the the metadata representation to something more "built-in" then we can keep discussing this. It seems like having a distinct

[DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-02 Thread Kenta Murata
[Abstract] When we have a string data encoded in a character encoding other than UTF-8, we must use a BinaryArray for the data. But Apache Arrow doesn’t provide the way to specify what a character encoding used in a BinaryArray. In this mail, I’d like to discuss how Apache Arrow provides the way