[ https://issues.apache.org/jira/browse/ARROW-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weston Pace closed ARROW-13004. ------------------------------- Resolution: Won't Fix This is handled better by the exec plan and we are not likely to pursue this in futures > [C++] Allow the creation of future "chains" to better control parallelism > ------------------------------------------------------------------------- > > Key: ARROW-13004 > URL: https://issues.apache.org/jira/browse/ARROW-13004 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Weston Pace > Priority: Major > > This is a bit tricky to explain. ShouldSchedule::Always works well for > AddCallback but falls short for Transfer and Then. An example may explain > best. > Consider three operators, Source, Transform, and Sink. They are setup as... > {code:java} > source_fut = source(); // 1 > transform_fut = source_fut.Then(Transform(), ScheduleAlways); // 2 > sink_fut = transform_fut.Then(Consume()); // 3 > {code} > The intent is to run Transform + Consume as a single thread task on each item > generated by source(). This is what happens if source() is slow. If > source() is fast (let's pretend it's always finished) then this is not what > happens. > Line 2 causes a new thread task to be launched (since source_fut is > finished). It is possible that new thread task can mark transform_fut > finished before line 3 is executed by the original thread. This causes > Consume() and Transform() to run on separate threads. > The solution (at least as best I can come up with) is unfortunately a little > complex (though the complexity can be hidden in future/async_generator > internals). Basically, it is worth waiting to schedule until the future > chain has had a chance to finish connecting the pressure. This means a > future created with ScheduleAlways is created in an "unconsumed" mode. Any > callbacks that would normally be launched will not be launched until the > future switches to "consumed". Future.Wait(), VisitAsyncGenerator, > CollectAsyncGenerator, and some of the async_generator operators would cause > the future to be "consumed". The "consume" signal will need to propagate > backwards up the chain so futures will need to keep a reference to their > antecedent future. > This work meshes well with some other improvements I have been considering, > in particular, splitting future/promise and restricting futures to a single > callback. > -- This message was sent by Atlassian Jira (v8.20.10#820010)