Peter, What you say is a bit confusing to me.
You say you have centers already. But then you talk about algorithms which find the centers. Also, you say you want to assign points based on centers, but you also say that clusters have different shapes, area, size and point count. Do you mean that assignment should be purely based on proximity to the center and that the shape will be whatever it happens to be as a result? Or do you mean that there is an a priori known shape that has to be taken into account during point assignment? If proximity is the only question, and if you can use great circle distance as your proximity measure, then this problem is fairly easy and can be handled in just a few lines of code. One easy way to handle this is to convert your centers to normalized x, y, z locations using x = cos \lambda cos \phi y = cos \lambda sin \phi z = sin \lambda where \lambda is the latitude and \phi is the longitude. Great circle distance is monotonically related to Euclidean distance in 3-space and thus is inversely monotonically related to dot production. This means you can sort the centers by distance to a point by simply computing x,y,z for the point and then doing the dot product and sorting in descending order. The nice thing with this is that there are no trig functions inside your inner loop. You can also use the haversine formula, but that requires 3-4 trig functions in the inner loop and is likely to be slower. You don't really need Mahout for this at all (unless I completely misunderstand your problem, which is quite possible). On Mon, Dec 2, 2013 at 1:31 AM, Peter K <peat...@yahoo.de> wrote: > Hi there, > > I've have no experience with mahout but I know that it will solve my > problem :) ! > > I've the following requirements: > > * No hadoop setup should be necessary. I want a simple approach and I > know this is possible with mahout! > * I have lots of points (~100 million) but also some RAM (32GB) > * I know the clusters upfront via its center positions. > * I need to assign every point to exactly one cluster. > * Every cluster can have a different shape, area size and point count > > I've found: > http://en.wikipedia.org/wiki/OPTICS_algorithm > http://en.wikipedia.org/wiki/DBSCAN > > Both algorithms do not really pay attention to the fixed cluster center > but I think I will start there. Is one of them implemented in mahout? > > Or do you have another idea or hint/link? > > Regards, > Peter. >