Re: Transformation not happening for reduceByKey or GroupByKey

satish chandra j Fri, 21 Aug 2015 05:49:44 -0700

HI All,
Any inputs for the actual problem statement

Regards,
Satish



On Fri, Aug 21, 2015 at 5:57 PM, Jeff Zhang <zjf...@gmail.com> wrote:

> Yong, Thanks for your reply.
>
> I tried spark-shell -i <script-file>, it works fine for me. Not sure the
> different with
> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>
> On Fri, Aug 21, 2015 at 7:01 PM, java8964 <java8...@hotmail.com> wrote:
>
>> I believe "spark-shell -i scriptFile" is there. We also use it, at least
>> in Spark 1.3.1.
>>
>> "dse spark" will just wrap "spark-shell" command, underline it is just
>> invoking "spark-shell".
>>
>> I don't know too much about the original problem though.
>>
>> Yong
>>
>> ------------------------------
>> Date: Fri, 21 Aug 2015 18:19:49 +0800
>> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
>> From: zjf...@gmail.com
>> To: jsatishchan...@gmail.com
>> CC: robin.e...@xense.co.uk; user@spark.apache.org
>>
>>
>> Hi Satish,
>>
>> I don't see where spark support "-i", so suspect it is provided by DSE.
>> In that case, it might be bug of DSE.
>>
>>
>>
>> On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <
>> jsatishchan...@gmail.com> wrote:
>>
>> HI Robin,
>> Yes, it is DSE but issue is related to Spark only
>>
>> Regards,
>> Satish Chandra
>>
>> On Fri, Aug 21, 2015 at 3:06 PM, Robin East <robin.e...@xense.co.uk>
>> wrote:
>>
>> Not sure, never used dse - it’s part of DataStax Enterprise right?
>>
>> On 21 Aug 2015, at 10:07, satish chandra j <jsatishchan...@gmail.com>
>> wrote:
>>
>> HI Robin,
>> Yes, below mentioned piece or code works fine in Spark Shell but the same
>> when place in Script File and executed with -i <file name> it creating an
>> empty RDD
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Command:
>>
>>         dse spark --master local --jars postgresql-9.4-1201.jar -i
>>  <ScriptFile>
>>
>> I understand, I am missing something here due to which my final RDD does
>> not have as required output
>>
>> Regards,
>> Satish Chandra
>>
>> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <robin.e...@xense.co.uk>
>> wrote:
>>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j <jsatishchan...@gmail.com>
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Transformation not happening for reduceByKey or GroupByKey

Reply via email to