Hi, Sorry, I'm not very familiar with those methods and cannot find the 'drop' method anywhere.
As an example: val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E")) val rdd = sc.parallelize(arr) val sorted = rdd.sortByKey(true) // ... then what? Thanks. Best regards, * Sampo Niskanen* *Lead developer / Wellmo* sampo.niska...@wellmo.com +358 40 820 5291 On Thu, Oct 22, 2015 at 10:43 AM, Adrian Tanase <atan...@adobe.com> wrote: > I'm not sure if there is a better way to do it directly using Spark APIs > but I would try to use mapPartitions and then within each partition > Iterable to: > > rdd.zip(rdd.drop(1)) - using the Scala collection APIs > > This should give you what you need inside a partition. I'm hoping that you > can partition your data somehow (e.g by user id or session id) that makes > you algorithm parallel. In that case you can use the snippet above in a > reduceByKey. > > hope this helps > -adrian > > Sent from my iPhone > > On 22 Oct 2015, at 09:36, Sampo Niskanen <sampo.niska...@wellmo.com> > wrote: > > Hi, > > I have analytics data with timestamps on each element. I'd like to > analyze consecutive elements using Spark, but haven't figured out how to do > this. > > Essentially what I'd want is a transform from a sorted RDD [A, B, C, D, E] > to an RDD [(A,B), (B,C), (C,D), (D,E)]. (Or some other way to analyze > time-related elements.) > > How can this be achieved? > > > * Sampo Niskanen* > > *Lead developer / Wellmo * > sampo.niska...@wellmo.com > +358 40 820 5291 > > >