As I've said before this issue is still a problem.
https://issues.apache.org/jira/browse/MAHOUT-1020?focusedCommentId=13409696#comment-13409696
This should be reopened and I sent you a link to get my data (only 8G good luck!)

My confusion with the per cluster density measure is because In 0.8 an output file is required for clusterdump but the per cluster density measure is not written to it. It's in the lNFO output to STDOUT. When I run a bunch of these the STDOUT is lost so I'll have to modify my scripts or update my KFinder code. I'd vote to include it in the output file in the future.

The only problem I've seen with the per cluster Intra-cluster density is that I get a lot of pruned clusters sometimes and the Intra-Cluster Density is not calculated for them. I think we've discussed this in the past.

12/07/11 12:22:12 INFO evaluation.ClusterEvaluator: Intra-Cluster Density[766] = 0.6243875150474454

I really would like to get this stuff working and am willing to provide whatever help you need if you are in a position to work on it. I have 0.8-SNAPSHOT building but am inexperienced debugging in this kind of large data situation but willing to learn. If you'd like me to try something out just point me in the right direction.

I'm also happy to test Ted's inter-cluster stuff too.


On 7/11/12 11:46 AM, Jeff Eastman wrote:
The ClusterEvaluator has methods for both inter-cluster density and intra-cluster density. The former computes the density using the cluster centers, while the latter uses a set of representative points extracted from the clustered points. This reduces the computational overhead of calculating a density from all of the points from each cluster.

The unit test uses synthetic data and produces reasonable looking results afaict. Have you had negative experiences with that?

On 7/11/12 1:21 PM, Pat Ferrel wrote:
...

It was my understanding that the ClusterEvaluator included an attempt to provide this measure with intra-cluster density per cluster though it looks like that output has been removed?




Reply via email to