On Thu, Jan 15, 2015 at 3:50 AM, Marko Dinic <[email protected]>
wrote:

>
>
> Thank you for your answer. Maybe I made a wrong picture about my data when
> giving sinusoid as an example, my time series are not periodic.


I merely continued with that example because it illustrates how an
alternative metric makes all sinusoids look very similar.


> Let's say that I have a signal that represents value of power when some
> device is turned on. That power signal depends of the time when person
> turns on that device. The device can be turned on after different interval
> and can stay turned on for different durations. But all that represents a
> similar behaviour that can be grouped together. I'm simplifying things a
> bit, but that's the concept.
>

Sure.  But mean and medoid are still something you can define here.


> Either way, I made a simple test in RapidMiner and used K-mediods with DTW
> as a similarity measure on a small subset and it gave very satisfying
> results (please don't laugh at my approach, I'm a beginner :) ).
>

Not laughing.  Sounds like a fine result.


>
> Now, in one of the previous mails you have answered that K-mediods isn't
> scalable, so it isn't implemented. Does scalable means not possible, or not
> such a good option? Why isn't it scalable? Does it take too much time for
> execution?
>

Scalable means that as the data gets bigger, the amount of resources gets
at most bigger in linear proportion.


>
> I've got an idea to implement it as two phase mapreduce, like this
>

Yeah... this is pretty standard,

BUT

That takes a number of iterations that is unknown and which depends on the
size of your data.  Each pass costs O(n).  Having many passes means that
you are super-linear in cost.

For k-means, you can create a sketch in one pass over the data and then
munch on the sketch to get a clustering.  This gives an O(n + k log n +
small goo) answer.

Reply via email to