Re: DTW distance measure and K-medioids, Hierarchical clustering

Marko Dinic Fri, 09 Jan 2015 03:04:45 -0800

Thank you for your answer Ted.

What about some kind of Bisecting k-means? I'm trying to cluster timeseries of different length and I came up to an idea to use DTW as asimilarity measure, which seems to be adequate, but the thing is, Icannot use it with K-means, since it's hard to define centroids basedon time series which can have different length/phase. So I was thinkingabout Hierarchical clustering, since it seems appropriate to combinewith DTW, but is not scalable, as you said. So my next thought is totry with bisecting k-means that seems scalable, since it is based onK-means step repetitions. My idea is next, by steps:

- Take two signals as initial centroids (maybe two signals that havesmallest similarity, calculated using DTW)

- Assign all signals to two initial centroids
- Repeat the procedure on the biggest cluster

In this way I could use DTW as distance measure, that could be usefulsince my data may be shifted, skewed, and avoid calculating centroids.At the end I could take one signal from each cluster that is the mostsimilar with others in cluster (some kind of centroid/medioid).


What do you think about this approach and about the scalability?

I would highly appreciate your answer, thanks.

On Thu 08 Jan 2015 08:19:18 PM CET, Ted Dunning wrote:

On Thu, Jan 8, 2015 at 7:00 AM, Marko Dinic <marko.di...@nissatech.com>
wrote:

1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout that
could be used as a distance measure for clustering?

No.


2) Why isn't there an implementation of K-mediods in Mahout? I'm guessing
that it could not be implemented efficiently on Hadoop, but I wanted to
check if something like that is possible.


Scalability as you suspected.


3) Same question, just considering Agglomerative Hierarchical clustering.


Again.  Agglomerative algorithms tend to be n^2 which contradicts scaling.

Re: DTW distance measure and K-medioids, Hierarchical clustering

Reply via email to