I have calculated similarity for all my docs. It has been suggested that this might be a good way to pick distances to use for canopies. When I look at distances for similar docs I see them all over the map, of course. And some that seem far away look pretty good. Is this just a matter of eyeballing or is there some better way of picking canopy distances from similarity distances?

BTW Could I vote for a better description of using RowSimilarity? Shouldn't it have a -ow parameter? It would also be nice if it calculated the number of columns from the input "matrix". These things make it hard to automate in scripts.

Reply via email to