Re: timeouts in S3 reads in pyarrow

Antoine Pitrou Mon, 29 Mar 2021 06:54:12 -0700


Hi Luke,


Given the error message, my intuition is that the timeout is on the
server side.  Arrow does not try to set any timeouts on S3 connections.

Note that this message ("When reading information") happens *before*
reading the file data, simply when trying to read the file length.  So
perhaps something is weird in your network configuration (is a firewall
blocking packets?).

Regards

Antoine.



On Sat, 27 Mar 2021 10:44:53 -0400
Luke <[email protected]> wrote:
> I have a local S3 compatible object store (using ceph) and am trying to use
> the pyarrow fs interface.  This seems to work well except on larger objects
> I am getting unhandled exceptions.  Is there a way to currently tune the
> timeouts or retries?  Here is the kind of code and error I am seeing:
> 
> from pyarrow import fs
> 
> 
> 
> s3 =
> fs.S3FileSystem(access_key=my_ak,secret_key=my_sk,endpoint_override=my_endpoint,scheme='http')
> 
> raw = s3.open_input_stream('test_bucket/example_key').readall()
> 
> 
> 
> File "pyarrow/_fs.pyx", line 621, in
> pyarrow._fs.FileSystem.open_input_stream
> 
> File "pyarrow/error.pxi", line 122, in
> pyarrow.lib.pyarrow_internal_check_status
> 
> File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> 
> OSError: When reading information for key 'example_key' in bucket
> 'test_bucket': AWS Error [code 99]: curlCode: 28, Timeout was reached
> 
> 
> 
> --
> 
> install details:
> 
> python: python 3.8.6
> 
> OS: linux, redhat 7.7
> 
> pyarrow version: 3.0.0
> 
> 
> thanks for the help,
> 
> Luke
>

Re: timeouts in S3 reads in pyarrow

Reply via email to