You could use cogroup to combine RDDs in one RDD for cross reference processing.
e.g. a.cogroup(b). filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case (k,(l,r)) => (k, l)} Best Regards, Raymond Liu -----Original Message----- From: marylucy [mailto:qaz163wsx_...@hotmail.com] Sent: Friday, August 29, 2014 9:26 PM To: Matthew Farrellee Cc: user@spark.apache.org Subject: Re: how to filter value in spark i see it works well,thank you!!! But in follow situation how to do var a = sc.textFile("/sparktest/1/").map((_,"a")) var b = sc.textFile("/sparktest/2/").map((_,"b")) How to get (3,"a") and (4,"a")???? 在 Aug 28, 2014,19:54,"Matthew Farrellee" <m...@redhat.com> 写道: > On 08/28/2014 07:20 AM, marylucy wrote: >> fileA=1 2 3 4 one number a line,save in /sparktest/1/ >> fileB=3 4 5 6 one number a line,save in /sparktest/2/ I want to get >> 3 and 4 >> >> var a = sc.textFile("/sparktest/1/").map((_,1)) >> var b = sc.textFile("/sparktest/2/").map((_,1)) >> >> a.filter(param=>{b.lookup(param._1).length>0}).map(_._1).foreach(prin >> tln) >> >> Error throw >> Scala.MatchError:Null >> PairRDDFunctions.lookup... > > the issue is nesting of the b rdd inside a transformation of the a rdd > > consider using intersection, it's more idiomatic > > a.intersection(b).foreach(println) > > but not that intersection will remove duplicates > > best, > > > matt > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For > additional commands, e-mail: user-h...@spark.apache.org > B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[��X�剀�X�KK[XZ[ ?\�\�][��X�剀�X�P?\���\X?K����B����Y][��[圹[X[??K[XZ[ ?\�\�Z[?\���\X?K����B�B