There is a Spark Package that gives some alternative distance metrics,
http://spark-packages.org/package/derrickburns/generalized-kmeans-clustering.
Not used it myself.
-
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
It looks like the distance metric is hard coded to the L2 norm (euclidean
distance) in MLlib. As you may expect, you are not the first person to
desire other metrics and there has been some prior effort.
Please reference this PR: https://github.com/apache/spark/pull/2634
And corresponding JIRA: