Statistical machine learning with Gaussian distributions

2013-05-10 Thread Matthew McClain
I'm pretty new to Mahout, but it looks like there aren't any statistical machine learning algorithms that use Gaussian distributions. Specifically, I'm thinking of clustering algorithms that use Gaussian distributions to model clusters and hidden Markov models that use Gaussian distributions. Can s

Re: Statistical machine learning with Gaussian distributions

2013-05-10 Thread Ted Dunning
K-means uses Gaussian errors. The dirichlet clustering can be configured to use Gaussian errors. SVD uses Gaussian errors. QR decomposition can be used to solve problems with Gaussian errors. I think I don't understand what you are asking about. On Fri, May 10, 2013 at 1:10 PM, Matthew McClai

Re: Statistical machine learning with Gaussian distributions

2013-05-11 Thread Matthew McClain
In k-means clustering, the clusters are characterized by their mean vectors, and data samples belong to clusters according to the distance to these means. If distance is measured using the L-2 norm (Euclidean distance), assigning data samples to clusters is equivalent to using maximum likelihood, w

Re: Statistical machine learning with Gaussian distributions

2013-05-11 Thread Ted Dunning
On Sat, May 11, 2013 at 9:43 AM, Matthew McClain wrote: > This constraint can be > removed by characterizing each cluster by the mean and covariance of its > samples, and using maximum likelihood in place of the distance measurement > for assigning clusters to samples. > Just a note that ordinary