Hi Sean and Madhu, Thank you for the explanation. I really appreciate it.
Best Regards, Jerry On Fri, Dec 19, 2014 at 4:50 AM, Sean Owen <so...@cloudera.com> wrote: > coalesce actually changes the number of partitions. Unless the > original RDD had just 1 partition, coalesce(1) will make an RDD with 1 > partition that is larger than the original partitions, of course. > > I don't think the question is about ordering of things within an > element of the RDD? > > If the original RDD was sorted, and so has a defined ordering, then it > will be preserved. Otherwise I believe you do not have any guarantees > about ordering. In practice, you may find that you still encounter the > elements in the same order after coalesce(1), although I am not sure > that is even true. > > union() is the same story; unless the RDDs are sorted I don't think > there are guarantees. However I'm almost certain that in practice, as > it happens now, A's elements would come before B's after a union, if > you did traverse them. > > On Fri, Dec 19, 2014 at 5:41 AM, madhu phatak <phatak....@gmail.com> > wrote: > > Hi, > > coalesce is an operation which changes no of records in a partition. It > will > > not touch ordering with in a row AFAIK. > > > > On Fri, Dec 19, 2014 at 2:22 AM, Jerry Lam <chiling...@gmail.com> wrote: > >> > >> Hi Spark users, > >> > >> I wonder if val resultRDD = RDDA.union(RDDB) will always have records in > >> RDDA before records in RDDB. > >> > >> Also, will resultRDD.coalesce(1) change this ordering? > >> > >> Best Regards, > >> > >> Jerry > > > > > > > > -- > > Regards, > > Madhukara Phatak > > http://www.madhukaraphatak.com >