Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
Hello everyone, I'm using Mahout Streaming K Means multiple times in a loop, every time for same input data, and output path is always different. Concretely, I'm increasing number of clusters in each iteration. Currently it is run on a single machine. A couple of times (maybe 3 of 20 runs) I

Re: Streaming K Means exception without any reason

2014-10-09 Thread Suneel Marthi
Seen this issue happen a few times before, there are few edge conditions that need to be fixed in the Streaming KMeans code and you are right that the generated clusters are different on successive runs given the same input. IIRC this stacktrace is due to BallKMeans failing to read any input centr

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
Suneel, Thank you for your answer, this was rather strange to me. The number of points is 942. I have multiple runs, in each run I have a loop in which number of clusters is increased in each iteration and I multiple that number by 3, since I'm expecting log(n) initial centroids, before Ball

Re: Streaming K Means exception without any reason

2014-10-09 Thread Suneel Marthi
Heh u r data size is tiny indeed. One of the edge conditions I was alluding to was the failures of this implementation on tiny datasets. Do u see any output clusters? If so how many points? possible to share ur dataset to troubleshoot ? On Thu, Oct 9, 2014 at 9:18 AM, Marko Dinić wrote: >

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
Yes it is small, but it is just a sample, so the dataset will probably be much bigger. So you think that this was the problem? Will this problem be avoided in case of larger dataset? I think that there were no output clusters, as I remember. I'm sending the dataset, if you want to take a look.

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
Here is the dataset. On четвртак, 09. октобар 2014. 16:53:25 CEST, Marko Dinić wrote: Yes it is small, but it is just a sample, so the dataset will probably be much bigger. So you think that this was the problem? Will this problem be avoided in case of larger dataset? I think that there were no

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
Here is the dataset, I've just checked to be sure it is the right one. On 09.10.2014. 15:34, Suneel Marthi wrote: Heh u r data size is tiny indeed. One of the edge conditions I was alluding to was the failures of this implementation on tiny datasets. Do u see any output clusters? If so how