case class Car(id:String,age:Int,tkm:Int,emissions:Int,date:Date, km:Int, fuel:Int)
1. Create an PairedRDD of (age,Car) tuples (pairedRDD) 2. Create a new function fc //returns the interval lower and upper bound def fc(x:Int, interval:Int) : (Int,Int) = { val floor = x - (x%interval) val ceil = floor + interval (floor,ceil) } 3. do a groupBy on this RDD (step 1) by passing the function fc val myrdd = pairedRDD.groupBy( x => fun(x.age, 5) ) On Mon, Sep 15, 2014 at 11:38 PM, boyingk...@163.com <boyingk...@163.com> wrote: > Hi: > I have a dataset ,the struct [id,driverAge,TotalKiloMeter ,Emissions > ,date,KiloMeter ,fuel], and the data like this: > [1-980,34,221926,9,2005-2-8,123,14] > [1-981,49,271321,15,2005-2-8,181,82] > [1-982,36,189149,18,2005-2-8,162,51] > [1-983,51,232753,5,2005-2-8,106,92] > [1-984,56,45338,8,2005-2-8,156,98] > [1-985,45,132060,4,2005-2-8,179,98] > [1-986,40,15751,5,2005-2-8,149,77] > [1-987,36,167930,17,2005-2-8,121,87] > [1-988,53,44949,4,2005-2-8,195,72] > [1-989,34,252867,5,2005-2-8,181,86] > [1-990,53,152858,11,2005-2-8,130,43] > [1-991,40,126831,11,2005-2-8,126,47] > ……………………………………………… > > now ,my requirments is group by driverAge, five is a step,like 20~25 is a > group,26~30 is a group? > how should i do ? who can give some code? > > > ------------------------------ > boyingk...@163.com >