In k-means clustering, the clusters are characterized by their mean
vectors, and data samples belong to clusters according to the distance to
these means. If distance is measured using the L-2 norm (Euclidean
distance), assigning data samples to clusters is equivalent to using
maximum likelihood, where the clusters are characterized by multivariate
Gaussian distributions - the distribution means are the same as the cluster
means and the covariance matrices are all equal to the identity matrix. In
the same way, using a Mahalanobis distance measure is like using a
different covariance matrix in the distributions, but all of the covariance
matrices are still the same for all clusters. This constraint can be
removed by characterizing each cluster by the mean and covariance of its
samples, and using maximum likelihood in place of the distance measurement
for assigning clusters to samples.

Matt


On Fri, May 10, 2013 at 6:41 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> K-means uses Gaussian errors.  The dirichlet clustering can be configured
> to use Gaussian errors.
>
> SVD uses Gaussian errors.  QR decomposition can be used to solve problems
> with Gaussian errors.
>
> I think I don't understand what you are asking about.
>
>
> On Fri, May 10, 2013 at 1:10 PM, Matthew McClain <mattmccla...@gmail.com
> >wrote:
>
> > I'm pretty new to Mahout, but it looks like there aren't any statistical
> > machine learning algorithms that use Gaussian distributions.
> Specifically,
> > I'm thinking of clustering algorithms that use Gaussian distributions to
> > model clusters and hidden Markov models that use Gaussian distributions.
> > Can someone tell if these are in Mahout somewhere, and if not, has this
> > been discussed at all?
> >
> > Thanks,
> > Matt
> >
>

Reply via email to