Hello again,

I have again a question about Streaming K Means, concretely about its use. From the paperworks that are given I understood that one of its advantages is that does one pass clustering and in that way can decrease number of iterations and work with large datasets.

What I'm interested in is - can I use it in online fashion? If I have data streaming from some data, can I use it to cluster incoming data in some way?

I understand that there is streaming step, that in some way looks more or less appropriate for my incoming data, but then there is a Ball K Means step, that is performed after streaming step. The question that arrives is - when to do Ball K Means step, since the data arrives all the time...

Should I even consider this, or should I go for lambda architecture?

Any help would be great.

Thanks,
Marko

Reply via email to