As I've said before this issue is still a problem.
https://issues.apache.org/jira/browse/MAHOUT-1020?focusedCommentId=13409696#comment-13409696
This should be reopened and I sent you a link to get my data (only 8G
good luck!)
My confusion with the per cluster density measure is because In 0.8 an
output file is required for clusterdump but the per cluster density
measure is not written to it. It's in the lNFO output to STDOUT. When I
run a bunch of these the STDOUT is lost so I'll have to modify my
scripts or update my KFinder code. I'd vote to include it in the output
file in the future.
The only problem I've seen with the per cluster Intra-cluster density is
that I get a lot of pruned clusters sometimes and the Intra-Cluster
Density is not calculated for them. I think we've discussed this in the
past.
12/07/11 12:22:12 INFO evaluation.ClusterEvaluator: Intra-Cluster
Density[766] = 0.6243875150474454
I really would like to get this stuff working and am willing to provide
whatever help you need if you are in a position to work on it. I have
0.8-SNAPSHOT building but am inexperienced debugging in this kind of
large data situation but willing to learn. If you'd like me to try
something out just point me in the right direction.
I'm also happy to test Ted's inter-cluster stuff too.
On 7/11/12 11:46 AM, Jeff Eastman wrote:
The ClusterEvaluator has methods for both inter-cluster density and
intra-cluster density. The former computes the density using the
cluster centers, while the latter uses a set of representative points
extracted from the clustered points. This reduces the computational
overhead of calculating a density from all of the points from each
cluster.
The unit test uses synthetic data and produces reasonable looking
results afaict. Have you had negative experiences with that?
On 7/11/12 1:21 PM, Pat Ferrel wrote:
...
It was my understanding that the ClusterEvaluator included an attempt
to provide this measure with intra-cluster density per cluster though
it looks like that output has been removed?