Hi Jaonary, With the current implementation, you need to call Array.slice to make each row an Array[Double] and cache the result RDD. There is a plan to support block-wise input data and I will keep you informed.
Best, Xiangrui On Tue, Mar 18, 2014 at 2:46 AM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > Dear All, > > I'm trying to cluster data from native library code with Spark Kmeans||. In > my native library the data are represented as a matrix (row = number of data > and col = dimension). For efficiency reason, they are copied into a one > dimensional scala Array row major wise so after the computation I have a > RDD[Array[Double]] but the dimension of each array represents a set of data > instead of the data itself. I need to transfrom these array into > Array[Array[Double]] before running the KMeans|| algorithm. > > How to do this efficiently ? > > > > Best regards,