Two things, - use trunk. We are about to release 0.5 and there has been a ton of progress since 0.4 including several important bug fixes.
- LDA isn't really clustering. It is more along the lines of SVD as a dimensionality reduction. It should be possible to display the internals to find which terms or documents have the highest components on a single topic, but combinations of topics are still interesting in LDA just as combinations of coordinates in SVD are interesting. - It would probably be more interesting if you were to cluster the LDA representation using k-means and look at those results. The reason that LDA is grouped together with the clustering algorithms is that it is unsupervised. It has some real differences, however. On Tue, Apr 26, 2011 at 12:16 PM, Ian Helmke <[email protected]> wrote: > I'm looking at using LDA to cluster documents based on topics. I've > gotten LDA to work in Mahout 0.4 and I am able to get keywords and > topics using the built-in mahout utilities. > > Is there any simple way to view which documents are assigned to which > clusters after performing LDA? This could easily be done using > canopy/kmeans with the -cl option (if I'm using the command line > utilities), but I don't see any equivalent anywhere in the LDA > utilities. >
