Ok, got it , thanks. On Thu, Jul 9, 2015 at 12:02 PM, prosp4300 <prosp4...@163.com> wrote:
> > > Seems what Feynman mentioned is the source code instead of documentation, > vectorMean is private, see > > https://github.com/apache/spark/blob/v1.3.0/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala > > At 2015-07-09 10:10:58, "诺铁" <noty...@gmail.com> wrote: > > thanks, I understand now. > but I can't find mllib.clustering.GaussianMixture#vectorMean , what > version of spark do you use? > > On Thu, Jul 9, 2015 at 1:16 AM, Feynman Liang <fli...@databricks.com> > wrote: > >> A RDD[Double] is an abstraction for a large collection of doubles, >> possibly distributed across multiple nodes. The DoubleRDDFunctions are >> there for performing mean and variance calculations across this distributed >> dataset. >> >> In contrast, a Vector is not distributed and fits on your local machine. >> You would be better off computing these quantities on the Vector directly >> (see mllib.clustering.GaussianMixture#vectorMean for an example of how to >> compute the mean of a vector). >> >> On Tue, Jul 7, 2015 at 8:26 PM, 诺铁 <noty...@gmail.com> wrote: >> >>> hi, >>> >>> there are some useful functions in DoubleRDDFunctions, which I can use >>> if I have RDD[Double], eg, mean, variance. >>> >>> Vector doesn't have such methods, how can I convert Vector to >>> RDD[Double], or maybe better if I can call mean directly on a Vector? >>> >> >> > > >