Joris Van den Bossche created ARROW-6529: --------------------------------------------
Summary: [C++] Feather: slow writing of NullArray Key: ARROW-6529 URL: https://issues.apache.org/jira/browse/ARROW-6529 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Joris Van den Bossche >From >https://stackoverflow.com/questions/57877017/pandas-feather-format-is-slow-when-writing-a-column-of-none Smaller example with just using pyarrow, it seems that writing an array of nulls takes much longer than an array of for example ints, which seems a bit strange: {code} In [93]: arr = pa.array([1]*1000) In [94]: %%timeit ...: w = pyarrow.feather.FeatherWriter('__test.feather') ...: w.writer.write_array('x', arr) ...: w.writer.close() 31.4 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [95]: arr = pa.array([None]*1000) In [96]: arr Out[96]: <pyarrow.lib.NullArray object at 0x7fa47a23ca40> 1000 nulls In [97]: %%timeit ...: w = pyarrow.feather.FeatherWriter('__test.feather') ...: w.writer.write_array('x', arr) ...: w.writer.close() 3.75 ms ± 64.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) {code} So writing the same length NullArray takes ca 100x more time. -- This message was sent by Atlassian Jira (v8.3.2#803003)