Hi,

Sorry, I'm not very familiar with those methods and cannot find the 'drop'
method anywhere.

As an example:

val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E"))
val rdd = sc.parallelize(arr)
val sorted = rdd.sortByKey(true)
// ... then what?


Thanks.


Best regards,

*    Sampo Niskanen*

*Lead developer / Wellmo*
    sampo.niska...@wellmo.com
    +358 40 820 5291


On Thu, Oct 22, 2015 at 10:43 AM, Adrian Tanase <atan...@adobe.com> wrote:

> I'm not sure if there is a better way to do it directly using Spark APIs
> but I would try to use mapPartitions and then within each partition
> Iterable to:
>
> rdd.zip(rdd.drop(1)) - using the Scala collection APIs
>
> This should give you what you need inside a partition. I'm hoping that you
> can partition your data somehow (e.g by user id or session id) that makes
> you algorithm parallel. In that case you can use the snippet above in a
> reduceByKey.
>
> hope this helps
> -adrian
>
> Sent from my iPhone
>
> On 22 Oct 2015, at 09:36, Sampo Niskanen <sampo.niska...@wellmo.com>
> wrote:
>
> Hi,
>
> I have analytics data with timestamps on each element.  I'd like to
> analyze consecutive elements using Spark, but haven't figured out how to do
> this.
>
> Essentially what I'd want is a transform from a sorted RDD [A, B, C, D, E]
> to an RDD [(A,B), (B,C), (C,D), (D,E)].  (Or some other way to analyze
> time-related elements.)
>
> How can this be achieved?
>
>
> *    Sampo Niskanen*
>
> *Lead developer / Wellmo *
>     sampo.niska...@wellmo.com
>     +358 40 820 5291
>
>
>

Reply via email to