from:"Antonin Delpeuch"

Re: Async RDD saves

2020-08-08 Thread Antonin Delpeuch (lists)

, Aug 7, 2020 at 10:38 AM Sean Owen <mailto:sro...@gmail.com>> wrote: > > Why do you need to do it, and can you just use a future in your > driver code? > > On Fri, Aug 7, 2020 at 9:01 AM Antonin Delpeuch (lists) > mailto:li...@antonin.delpeuch.eu>>

Async RDD saves

2020-08-07 Thread Antonin Delpeuch (lists)

Hi all, Following my request on the user mailing list [1], there does not seem to be any simple way to save RDDs to the file system in an asynchronous way. I am looking into implementing this, so I am first checking whether there is consensus around the idea. The goal would be to add methods such

Re: Map with state for RDDs

2020-05-24 Thread Antonin Delpeuch (lists)

On 24/05/2020 11:27, Antonin Delpeuch (lists) wrote: > With this formulation, zipWithIndex would be a special case of > mapWithState (so it could be refactored to be expressed as such). Forget about this part, it would obviously not, since zipWithIndex can compute the size of each par

Re: Map with state for RDDs

2020-05-24 Thread Antonin Delpeuch (lists)

ion, zipWithIndex would be a special case of mapWithState (so it could be refactored to be expressed as such). Antonin On 24/05/2020 10:58, Antonin Delpeuch (lists) wrote: > Hi, > > Spark Streaming has a `mapWithState` API to run a map on a stream while > maintaining a state as el

Map with state for RDDs

2020-05-24 Thread Antonin Delpeuch (lists)

Hi, Spark Streaming has a `mapWithState` API to run a map on a stream while maintaining a state as elements are read. The core RDD API does not seem to have anything similar. Given a RDD of elements of type T, an initial state of type S and a map function (S,T) -> (S,T), return an RDD of Ts obtai

Re: RDD order guarantees

2020-05-06 Thread Antonin Delpeuch (lists)

Thanks a lot for the reply Steve! If you don't see a way to fix this in Spark itself, then I will try to improve the docs. Antonin On 06/05/2020 17:19, Steve Loughran wrote: > > > On Tue, 7 Apr 2020 at 15:26, Antonin Delpeuch <mailto:li...@antonin.delpeuch.eu>

Re: RDD order guarantees

2020-04-07 Thread Antonin Delpeuch

Hi, Sorry to dig out this thread but this bug is still present. The fix proposed in this thread (creating a new FileSystem implementation which sorts listed files) was rejected, with the suggestion that it is the FileInputFormat's responsibility to sort the file names if preserving partition orde

Re: Async RDD saves

Async RDD saves

Re: Map with state for RDDs

Re: Map with state for RDDs

Map with state for RDDs

Re: RDD order guarantees

Re: RDD order guarantees

7 matches

Site Navigation

Mail list logo

Footer information