Re: UNION two RDDs

Jerry Lam Mon, 22 Dec 2014 13:48:22 -0800

Hi Sean and Madhu,

Thank you for the explanation. I really appreciate it.


Best Regards,

Jerry


On Fri, Dec 19, 2014 at 4:50 AM, Sean Owen <so...@cloudera.com> wrote:

> coalesce actually changes the number of partitions. Unless the
> original RDD had just 1 partition, coalesce(1) will make an RDD with 1
> partition that is larger than the original partitions, of course.
>
> I don't think the question is about ordering of things within an
> element of the RDD?
>
> If the original RDD was sorted, and so has a defined ordering, then it
> will be preserved. Otherwise I believe you do not have any guarantees
> about ordering. In practice, you may find that you still encounter the
> elements in the same order after coalesce(1), although I am not sure
> that is even true.
>
> union() is the same story; unless the RDDs are sorted I don't think
> there are guarantees. However I'm almost certain that in practice, as
> it happens now, A's elements would come before B's after a union, if
> you did traverse them.
>
> On Fri, Dec 19, 2014 at 5:41 AM, madhu phatak <phatak....@gmail.com>
> wrote:
> > Hi,
> > coalesce is an operation which changes no of records in a partition. It
> will
> > not touch ordering with in a row AFAIK.
> >
> > On Fri, Dec 19, 2014 at 2:22 AM, Jerry Lam <chiling...@gmail.com> wrote:
> >>
> >> Hi Spark users,
> >>
> >> I wonder if val resultRDD = RDDA.union(RDDB) will always have records in
> >> RDDA before records in RDDB.
> >>
> >> Also, will resultRDD.coalesce(1) change this ordering?
> >>
> >> Best Regards,
> >>
> >> Jerry
> >
> >
> >
> > --
> > Regards,
> > Madhukara Phatak
> > http://www.madhukaraphatak.com
>

Re: UNION two RDDs

Reply via email to