, Aug 7, 2020 at 10:38 AM Sean Owen <mailto:sro...@gmail.com>> wrote:
>
> Why do you need to do it, and can you just use a future in your
> driver code?
>
> On Fri, Aug 7, 2020 at 9:01 AM Antonin Delpeuch (lists)
> mailto:li...@antonin.delpeuch.eu>>
Hi all,
Following my request on the user mailing list [1], there does not seem
to be any simple way to save RDDs to the file system in an asynchronous
way. I am looking into implementing this, so I am first checking whether
there is consensus around the idea.
The goal would be to add methods such
On 24/05/2020 11:27, Antonin Delpeuch (lists) wrote:
> With this formulation, zipWithIndex would be a special case of
> mapWithState (so it could be refactored to be expressed as such).
Forget about this part, it would obviously not, since zipWithIndex can
compute the size of each par
ion, zipWithIndex would be a special case of
mapWithState (so it could be refactored to be expressed as such).
Antonin
On 24/05/2020 10:58, Antonin Delpeuch (lists) wrote:
> Hi,
>
> Spark Streaming has a `mapWithState` API to run a map on a stream while
> maintaining a state as el
Hi,
Spark Streaming has a `mapWithState` API to run a map on a stream while
maintaining a state as elements are read.
The core RDD API does not seem to have anything similar. Given a RDD of
elements of type T, an initial state of type S and a map function (S,T)
-> (S,T), return an RDD of Ts obtai
Thanks a lot for the reply Steve!
If you don't see a way to fix this in Spark itself, then I will try to
improve the docs.
Antonin
On 06/05/2020 17:19, Steve Loughran wrote:
>
>
> On Tue, 7 Apr 2020 at 15:26, Antonin Delpeuch <mailto:li...@antonin.delpeuch.eu>
Hi,
Sorry to dig out this thread but this bug is still present.
The fix proposed in this thread (creating a new FileSystem implementation
which sorts listed files) was rejected, with the suggestion that it is the
FileInputFormat's responsibility to sort the file names if preserving
partition orde