Re: how to use DoubleRDDFunctions on mllib Vector?

Feynman Liang Wed, 08 Jul 2015 10:18:06 -0700

A RDD[Double] is an abstraction for a large collection of doubles, possibly
distributed across multiple nodes. The DoubleRDDFunctions are there for
performing mean and variance calculations across this distributed dataset.

In contrast, a Vector is not distributed and fits on your local machine.
You would be better off computing these quantities on the Vector directly
(see mllib.clustering.GaussianMixture#vectorMean for an example of how to
compute the mean of a vector).

On Tue, Jul 7, 2015 at 8:26 PM, 诺铁 <noty...@gmail.com> wrote:

> hi,
>
> there are some useful functions in DoubleRDDFunctions, which I can use if
> I have RDD[Double], eg, mean, variance.
>
> Vector doesn't have such methods, how can I convert Vector to RDD[Double],
> or maybe better if I can call mean directly on a Vector?
>

Re: how to use DoubleRDDFunctions on mllib Vector?

Reply via email to