Hello again,
I have again a question about Streaming K Means, concretely about its
use. From the paperworks that are given I understood that one of its
advantages is that does one pass clustering and in that way can decrease
number of iterations and work with large datasets.
What I'm interested in is - can I use it in online fashion? If I have
data streaming from some data, can I use it to cluster incoming data in
some way?
I understand that there is streaming step, that in some way looks more
or less appropriate for my incoming data, but then there is a Ball K
Means step, that is performed after streaming step. The question that
arrives is - when to do Ball K Means step, since the data arrives all
the time...
Should I even consider this, or should I go for lambda architecture?
Any help would be great.
Thanks,
Marko