On Thu, 19 Aug 2021 15:40:53 +0000
Hagai Har-Gil <[email protected]> wrote:
> Right - I have a different app that uses sockets in another context for a 
> similar goal.
> 
> The thing is - the Stream object is "advertised" (so to say) as a suitable 
> holder for such data. E.g., looking at the docs for 
> `pyarrow.ipc.open_stream()` and `pyarrow.ipc.NativeFile`, they specifically 
> mention how this is the right approach when doing streaming, and I assumed 
> that concurrent reading from that stream is a viable use case for such files.
> 
> Perhaps I'm just completely ignorant of this topic and should've realized 
> that a NativeFile can't support this use case, but I believe that a minimal 
> warning against such "abuse" of the IPC protocol might be helpful in the 
> future.

Well, the IPC protocol does not change the semantics of the underlying
file.  If you're using a regular disk file, then by construction there's
no guarding against unsynchronised access.  If you're using a socket,
then you get synchronisation by construction.

I notice it is not possible currently to create a pyarrow.OSFile
from a file descriptor:
https://issues.apache.org/jira/browse/ARROW-10906

However, you should be able to create a pyarrow.PythonFile from a
Python socket's file object (obtained using socket.makefile()).  It
will be less performant, but should hopefully work.

Regards

Antoine.


Reply via email to