Björn Andersson created ARROW-3098:
--------------------------------------

             Summary: [Python] BufferReader doesn't adhere to the seek protocol
                 Key: ARROW-3098
                 URL: https://issues.apache.org/jira/browse/ARROW-3098
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.10.0
            Reporter: Björn Andersson


I have a script that creates a Parquet file and then writes it out to a 
{{BufferOutputStream}} and then into a {{BufferReader}} with the intention of 
passing it to a place that takes a file-like object to upload it somewhere 
else. But the other location relies on being able to seek to the end of the 
file to figure out how big the file is, e.g.

{code:python}
reader.seek(0, 2)
size = reader.tell()
reader.seek(0)
{code}
 

But when I do that the following exception is raised: 

 
{code}
pyarrow/io.pxi:209: in pyarrow.lib.NativeFile.seek
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

> ???
E pyarrow.lib.ArrowIOError: position out of bounds
{code}

I compared it to casting to an {{io.BytesIO}} instead which works:

{code:python}
import io

import pyarrow as pa


def test_arrow_output_stream():
    output = pa.BufferOutputStream()
    output.write(b'hello')

    reader = pa.BufferReader(output.getvalue())

    reader.seek(0, 2)
    assert reader.tell() == 5


def test_python_io_stream():
    output = pa.BufferOutputStream()
    output.write(b'hello')

    buffer = io.BytesIO(output.getvalue().to_pybytes())
    reader = io.BufferedRandom(buffer)

    reader.seek(0, 2)
    assert reader.tell() == 5
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to