AFAIK ordering is not strictly guaranteed unless the RDD is the product of a sort. I think that in practice, you'll never find elements of a file read in some random order, for example (although see the recent issue about partition ordering potentially depending on how the local file system lists them).
Likewise I can't imagine you encounter elements from one Kafka partition out of order. One receiver hears one partition and create one block per block interval. What I'm not 100% clear on is whether you get undefined ordering when you have multiple threads listening in one receiver. You can always sort RDDs by a timestamp of some sort to be sure, although that has overheads. I'm also curious about what if anything is guaranteed here without a sort. On Mon, Jan 26, 2015 at 1:33 AM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Sean, > > On Mon, Jan 26, 2015 at 10:28 AM, Sean Owen <so...@cloudera.com> wrote: >> >> Note that RDDs don't really guarantee anything about ordering though, >> so this only makes sense if you've already sorted some upstream RDD by >> a timestamp or sequence number. > > > Speaking of order, is there some reading on guarantees and non-guarantees > about order in RDDs? For example, when reading a file and doing > zipWithIndex, can I assume that the lines are numbered in order? Does this > hold for receiving data from Kafka, too? > > Tobias > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org