Re: Pyarrow RecordBatchStreamWriter and dictionaries

2021-05-03 Thread Alessandro Molina
Hi Radu, I was trying to reproduce the issue you described, but I was unable to reproduce the problem. Could you provide an example of how you built the Table? I tried reproducing it with a table with following schema pa.schema([ pa.field('nums', pa.list_(pa.int32())), pa.field('chars', pa.list_

Re: Pyarrow RecordBatchStreamWriter and dictionaries

2021-04-24 Thread Wes McKinney
hi Radu — sounds potentially buggy, if you can create a Jira with a repro that would be very helpful On Thu, Apr 22, 2021 at 11:36 PM Radu Teodorescu wrote: > > Hi I am seeing a similar problem when serializing tables with lists of > dictionary encoded elements: each resulting chunk is pointing

Re: Pyarrow RecordBatchStreamWriter and dictionaries

2021-04-22 Thread Radu Teodorescu
Hi I am seeing a similar problem when serializing tables with lists of dictionary encoded elements: each resulting chunk is pointing to the first chunk’s original dictionary. Is this a known issue/limitation. I can follow with a repro otherwise. Thank you Radu > On Sep 28, 2020, at 1:26 PM, Wes

Re: Pyarrow RecordBatchStreamWriter and dictionaries

2020-09-28 Thread Wes McKinney
hi Al, It's definitely wrong. I confirmed the behavior is present on master. https://issues.apache.org/jira/browse/ARROW-10121 I made this a blocker for the release. Thanks, Wes On Mon, Sep 28, 2020 at 10:52 AM Al Taylor wrote: > > Hi, > > I've found that when I serialize two recordbatches wh

Pyarrow RecordBatchStreamWriter and dictionaries

2020-09-28 Thread Al Taylor
Hi, I've found that when I serialize two recordbatches which have a dictionary-encoded field, but different encoding dictionaries to a sequence of pybytes with a RecordBatchStreamWriter, then deserialize using pa.ipc.open_stream(), the dictionaries get jumbled. (or at least, on deserialization