On Tue, May 21, 2013 at 8:47 PM, Pat Ferrel <pat.fer...@gmail.com> wrote:

> For this sample it looks like about 20-40 clusters is "best"? Looking at
> the results for k=40 by eyeball they do seem pretty good.


It is really hard to tell with these numbers.  IN spite of their heritage,
these scaled average distances are kind of debatable as things to compare,
if only because they are scaled differently.

My own tendency is to prefer to use unscaled intra-cluster average
distance.  This should monotonically decrease as k increases.  The
interesting question (for me) is what the same average is for held-out data.

This measure of quality is focused around the use of clustering as a
feature for downstream modeling, not necessarily for human consumption.

Reply via email to