Hi
I wanted to ask whats the best way to achieve per key auto increment
numerals after sorting, for eg. :
raw file:
1,a,b,c,1,1
1,a,b,d,0,0
1,a,b,e,1,0
2,a,e,c,0,0
2,a,f,d,1,0
post-output (the last column is the position number after grouping on
first three fields and reverse sorting on last
Scheduler.scala:693)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
On Sun, Oct 18, 2015 at 11:17 PM, Jeff Zhang <zjf...@gmail.
?
best fahad
On Mon, Oct 19, 2015 at 10:45 AM, Davies Liu <dav...@databricks.com> wrote:
> What's the issue with groupByKey()?
>
> On Mon, Oct 19, 2015 at 1:11 AM, fahad shah <sfaha...@gmail.com> wrote:
>> Hi
>>
>> I wanted to ask whats the best way to ach
gt; On Sun, Oct 18, 2015 at 10:42 PM, fahad shah <sfaha...@gmail.com> wrote:
>> Hi
>>
>> I am trying to do pair rdd's, group by the key assign id based on key.
>> I am using Pyspark with spark 1.3, for some reason, I am getting this
>> error that I am unable to
Hi
I am trying to do pair rdd's, group by the key assign id based on key.
I am using Pyspark with spark 1.3, for some reason, I am getting this
error that I am unable to figure out - any help much appreciated.
Things I tried (but to no effect),
1. make sure I am not doing any conversions on