Weston Pace created ARROW-12161:
-----------------------------------

             Summary: [C++] Async streaming CSV reader deadlocking when being 
run synchronously from datasets
                 Key: ARROW-12161
                 URL: https://issues.apache.org/jira/browse/ARROW-12161
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace
            Assignee: Weston Pace


ARROW-11887 added async to the streaming CSV reader.  In order to keep 
backwards compatibility the old sync API simply calls the async API and waits 
for it to finish.  However, that wait cannot happen safely in a "nested" 
context (e.g. dataset reading).

For example, imagine two cores.  The dataset read launches two CSV scans.  Each 
scan occupies a core waiting for a future.  Those futures are being filled by 
I/O threads.  The I/O threads finish and go to transfer.  The transfer cannot 
happen because the CPU executor is filled.

This will be fixed as part of ARROW-7001 but that still some ways away.  An 
easier change might be to take some of the 7001 changes and include them as 
part of the 11887 feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to