date:20201104

Re: bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?

2020-11-04 Thread Jason Sachs

GAH! It looks like it might be my problem, not pyarrow; type code S is a null-terminated data: https://numpy.org/doc/stable/reference/arrays.dtypes.html 'S', 'a' zero-terminated bytes (not recommended) Now I have to figure out why I'm getting that S code (it's generated through some sort of ope

Re: bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?

2020-11-04 Thread Jason Sachs

> Seems a bit buggy Yeah that's a bit of an understatement :/ Done. https://issues.apache.org/jira/browse/ARROW-10498 I'm trying to poke around, but it looks like it may affect all of the from_* methods. I don't grok Cython very well, so am not sure I can get to a root cause easily. On 2020/

Re: bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?

2020-11-04 Thread Wes McKinney

Seems a bit buggy, can you open a Jira issue? Thanks On Wed, Nov 4, 2020 at 5:05 PM Jason Sachs wrote: > > It looks like pyarrow.Table.from_pydict() cuts off binary data after an > embedded 00 byte. Is this a known bug? > > (py3) C:\>python > Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.

bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?

2020-11-04 Thread Jason Sachs

It looks like pyarrow.Table.from_pydict() cuts off binary data after an embedded 00 byte. Is this a known bug? (py3) C:\>python Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32 Type "help", "copyright", "credits" or "license" for more informati

Re: Compressing parquet metadata?

2020-11-04 Thread Jason Sachs

Yes. Ouch, so there's a 4/3 hit there for base64. (is that always the case or does it use plaintext if possible?) I'm trying to figure out what kind of request to file in the issue tracker to help support my use case. (data logging) I have enough stuff I want to put in metadata that the use of

Re: Compressing parquet metadata?

2020-11-04 Thread Wes McKinney

You mean the key-value metadata at the schema/field-level? That can be binary (it gets base64-encoded when written to Parquet) On Wed, Nov 4, 2020 at 10:22 AM Jason Sachs wrote: > > OK. If I take the manual approach, do parquet / arrow care whether metadata > is binary or not? > > On 2020/11/04

Re: Compressing parquet metadata?

2020-11-04 Thread Jason Sachs

OK. If I take the manual approach, do parquet / arrow care whether metadata is binary or not? On 2020/11/04 14:16:37, Wes McKinney wrote: > There is not to my knowledge. > > On Tue, Nov 3, 2020 at 5:55 PM Jason Sachs wrote: > > > > Is there any built-in method to compress parquet metadata? Fr

Re: Compressing parquet metadata?

2020-11-04 Thread Wes McKinney

There is not to my knowledge. On Tue, Nov 3, 2020 at 5:55 PM Jason Sachs wrote: > > Is there any built-in method to compress parquet metadata? From what I can > tell, the main table columns are compressed, but not the metadata. > > I have metadata which includes 100-200KB of text (JSON format) t

Arrow java implementation: Compatible IO streams.

2020-11-04 Thread Saloni Udani

Hello, I have a use case where I want to write an arrow batch to my existing output stream (custom stream extending java.io.OutputStream) and reading from my existing input stream (custom stream extending java.io.InputStream). I used ArrowStreamWriter and ArrowStreamReader but on the reader side

Re: bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?

Re: bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?

Re: bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?

bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?

Re: Compressing parquet metadata?

Re: Compressing parquet metadata?

Re: Compressing parquet metadata?

Re: Compressing parquet metadata?

Arrow java implementation: Compatible IO streams.

9 matches

Site Navigation

Mail list logo

Footer information