Only beginning and ending part of data. The rest in the partition can be compared without shuffle.
Sincerely, DB Tsai ---------------------------------------------------------- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Dec 6, 2015 at 6:27 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: > > > > > On Saturday, December 5, 2015 3:00 PM, DB Tsai <dbt...@dbtsai.com> wrote: > > > This is tricky. You need to shuffle the ending and beginning elements > using mapPartitionWithIndex. > > > Does this mean that I need to shuffle the all elements in different > partitions into one partition, then compare them by way of any two adjacent > elements? > It seems good, if it is like that. > > One more issue, will it loss parallelism since there become only one > partition ... > > Thanks very much in advance! > > > > > > > Sincerely, > > DB Tsai > ---------------------------------------------------------- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: >> Hi All, >> >> I would like to compare any two adjacent elements in one given rdd, just >> as >> the single machine code part: >> >> int a[N] = {...}; >> for (int i=0; i < N - 1; ++i) { >> compareFun(a[i], a[i+1]); >> } >> ... >> >> mapPartitions may work for some situations, however, it could not compare >> elements in different partitions. >> foreach also seems not work. >> >> Thanks, >> Zhiliang > >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org