HI Robin,
Yes, below mentioned piece or code works fine in Spark Shell but the same
when place in Script File and executed with -i <file name> it creating an
empty RDD

scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
makeRDD at <console>:28


scala> pairs.reduceByKey((x,y) => x + y).collect
res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

Command:

        dse spark --master local --jars postgresql-9.4-1201.jar -i
 <ScriptFile>

I understand, I am missing something here due to which my final RDD does
not have as required output

Regards,
Satish Chandra

On Thu, Aug 20, 2015 at 8:23 PM, Robin East <robin.e...@xense.co.uk> wrote:

> This works for me:
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at <console>:28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> On 20 Aug 2015, at 11:05, satish chandra j <jsatishchan...@gmail.com>
> wrote:
>
> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
> Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
> <console>:73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>
>
> Please let me know what is missing in my code, as my resultant Array is
> empty
>
>
>
> Regards,
> Satish
>
>
>

Reply via email to