Re: How to deal with multidimensional keys?

2014-01-02 Thread K. Shankari
I have had to use this as well. Sometimes, I create a POJO to hold the multi-dimensional key to make things easier. ie. class MultiKey(i, j, k) { } then I can define a reduce function that is over the multikey, e.g. def reduceByI(mkv1: (MultiKey, Value), mkv2: (MultiKey: Value)) = if (mkv1.i >

Re: How to deal with multidimensional keys?

2014-01-02 Thread Andrew Ash
If you had RDD[[i, j, k], value] then you could reduce by j by essentially mapping j into the key slot, doing the reduce, and then mapping it back: rdd.map( ((i,j,k),v) => (j, (i, k, v)).reduce( ... ).map( (j,(i,k,v)) => ((i,j,k),v)) It's not pretty, but I've had to use this pattern before too.

How to deal with multidimensional keys?

2014-01-02 Thread Aureliano Buendia
Hi, How is it possible to reduce by multidimensional keys? For example, if every line is a tuple like: (i, j, k, value) or, alternatively: ((I, j, k), value) how can spark handle reducing over j, or k?