Thank you for your answer Ted.

What about some kind of Bisecting k-means? I'm trying to cluster time series of different length and I came up to an idea to use DTW as a similarity measure, which seems to be adequate, but the thing is, I cannot use it with K-means, since it's hard to define centroids based on time series which can have different length/phase. So I was thinking about Hierarchical clustering, since it seems appropriate to combine with DTW, but is not scalable, as you said. So my next thought is to try with bisecting k-means that seems scalable, since it is based on K-means step repetitions. My idea is next, by steps:

- Take two signals as initial centroids (maybe two signals that have smallest similarity, calculated using DTW)
- Assign all signals to two initial centroids
- Repeat the procedure on the biggest cluster

In this way I could use DTW as distance measure, that could be useful since my data may be shifted, skewed, and avoid calculating centroids. At the end I could take one signal from each cluster that is the most similar with others in cluster (some kind of centroid/medioid).

What do you think about this approach and about the scalability?

I would highly appreciate your answer, thanks.

On Thu 08 Jan 2015 08:19:18 PM CET, Ted Dunning wrote:
On Thu, Jan 8, 2015 at 7:00 AM, Marko Dinic <marko.di...@nissatech.com>
wrote:

1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout that
could be used as a distance measure for clustering?


No.



2) Why isn't there an implementation of K-mediods in Mahout? I'm guessing
that it could not be implemented efficiently on Hadoop, but I wanted to
check if something like that is possible.


Scalability as you suspected.



3) Same question, just considering Agglomerative Hierarchical clustering.


Again.  Agglomerative algorithms tend to be n^2 which contradicts scaling.

Reply via email to