For this, mapWithPartitionsWithIndex would also properly work for filter. Here is the code copied for stack-overflow, which is used to remove the first line of a csv file: JavaRDD<String> rawInputRdd = sparkContext.textFile(dataFile);
Function2 removeHeader= new Function2<Integer, Iterator<String>, Iterator<String>>() { @Override public Iterator<String> call(Integer index, Iterator<String> iterator) throws Exception { if(index == 0 && iterator.hasNext()) { //for my usage, iterator.next(); //compare any two adjacent elements, or do filter, return iterator; //then index parameter is useless here, just is OK to view iterator as from one logical iterator/partition // is it } else return iterator; } }; JavaRDD<String> inputRdd = rawInputRdd.mapPartitionsWithIndex(removeHeader, false); On Saturday, December 5, 2015 3:52 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi DB Tsai, Thanks very much for your kind reply! Sorry that for one more issue, as tested it seems that filter could only return JavaRDD<Boolean> but not any JavaRDD<T> , is it ?Then it is not much convenient to do general filter for RDD, mapPartitions could work some, but if some partition will left and return none element after filter by mapPartitions, some problemwill be there. Best Wishes!Zhiliang On Saturday, December 5, 2015 3:00 PM, DB Tsai <dbt...@dbtsai.com> wrote: This is tricky. You need to shuffle the ending and beginning elements using mapPartitionWithIndex. Sincerely, DB Tsai ---------------------------------------------------------- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: > Hi All, > > I would like to compare any two adjacent elements in one given rdd, just as > the single machine code part: > > int a[N] = {...}; > for (int i=0; i < N - 1; ++i) { > compareFun(a[i], a[i+1]); > } > ... > > mapPartitions may work for some situations, however, it could not compare > elements in different partitions. > foreach also seems not work. > > Thanks, > Zhiliang > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org