GAH! It looks like it might be my problem, not pyarrow; type code S is a
null-terminated data:
https://numpy.org/doc/stable/reference/arrays.dtypes.html
'S', 'a' zero-terminated bytes (not recommended)
Now I have to figure out why I'm getting that S code (it's generated through
some sort of ope
> Seems a bit buggy
Yeah that's a bit of an understatement :/
Done. https://issues.apache.org/jira/browse/ARROW-10498
I'm trying to poke around, but it looks like it may affect all of the from_*
methods. I don't grok Cython very well, so am not sure I can get to a root
cause easily.
On 2020/
Seems a bit buggy, can you open a Jira issue? Thanks
On Wed, Nov 4, 2020 at 5:05 PM Jason Sachs wrote:
>
> It looks like pyarrow.Table.from_pydict() cuts off binary data after an
> embedded 00 byte. Is this a known bug?
>
> (py3) C:\>python
> Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.
It looks like pyarrow.Table.from_pydict() cuts off binary data after an
embedded 00 byte. Is this a known bug?
(py3) C:\>python
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] ::
Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more informati
Yes. Ouch, so there's a 4/3 hit there for base64. (is that always the case or
does it use plaintext if possible?)
I'm trying to figure out what kind of request to file in the issue tracker to
help support my use case. (data logging)
I have enough stuff I want to put in metadata that the use of
You mean the key-value metadata at the schema/field-level? That can
be binary (it gets base64-encoded when written to Parquet)
On Wed, Nov 4, 2020 at 10:22 AM Jason Sachs wrote:
>
> OK. If I take the manual approach, do parquet / arrow care whether metadata
> is binary or not?
>
> On 2020/11/04
OK. If I take the manual approach, do parquet / arrow care whether metadata is
binary or not?
On 2020/11/04 14:16:37, Wes McKinney wrote:
> There is not to my knowledge.
>
> On Tue, Nov 3, 2020 at 5:55 PM Jason Sachs wrote:
> >
> > Is there any built-in method to compress parquet metadata? Fr
There is not to my knowledge.
On Tue, Nov 3, 2020 at 5:55 PM Jason Sachs wrote:
>
> Is there any built-in method to compress parquet metadata? From what I can
> tell, the main table columns are compressed, but not the metadata.
>
> I have metadata which includes 100-200KB of text (JSON format) t
Hello,
I have a use case where I want to write an arrow batch to my existing
output stream (custom stream extending java.io.OutputStream) and reading
from my existing input stream (custom stream extending
java.io.InputStream). I used ArrowStreamWriter and ArrowStreamReader but
on the reader side