In k-means clustering, the clusters are characterized by their mean vectors, and data samples belong to clusters according to the distance to these means. If distance is measured using the L-2 norm (Euclidean distance), assigning data samples to clusters is equivalent to using maximum likelihood, where the clusters are characterized by multivariate Gaussian distributions - the distribution means are the same as the cluster means and the covariance matrices are all equal to the identity matrix. In the same way, using a Mahalanobis distance measure is like using a different covariance matrix in the distributions, but all of the covariance matrices are still the same for all clusters. This constraint can be removed by characterizing each cluster by the mean and covariance of its samples, and using maximum likelihood in place of the distance measurement for assigning clusters to samples.
Matt On Fri, May 10, 2013 at 6:41 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > K-means uses Gaussian errors. The dirichlet clustering can be configured > to use Gaussian errors. > > SVD uses Gaussian errors. QR decomposition can be used to solve problems > with Gaussian errors. > > I think I don't understand what you are asking about. > > > On Fri, May 10, 2013 at 1:10 PM, Matthew McClain <mattmccla...@gmail.com > >wrote: > > > I'm pretty new to Mahout, but it looks like there aren't any statistical > > machine learning algorithms that use Gaussian distributions. > Specifically, > > I'm thinking of clustering algorithms that use Gaussian distributions to > > model clusters and hidden Markov models that use Gaussian distributions. > > Can someone tell if these are in Mahout somewhere, and if not, has this > > been discussed at all? > > > > Thanks, > > Matt > > >