Re: DBSCAN implementation in Mahout

3316 Chirag Nagpal Sun, 30 Nov 2014 05:13:04 -0800

Hi Ted,

Thanks for the reply.

I have been using DBSCAN (in python), the one implemented in sci-kit package. 
For a dataset with about 8k points, the running time on my Intel i7 4700 QM 
comes to around ~300 seconds.

I have implemented a parallel version using the multiprocessing python library, 
and the running time comes down to about 100~120 seconds, when I 3 parallel 
threads. 

Thus the scale up is almost 'n'. I think scalability should not be an issue for 
a Map Reduce implementation.

Chirag Nagpal
University of Pune, India
www.chiragnagpal.com
________________________________________
From: Ted Dunning <ted.dunn...@gmail.com>
Sent: Sunday, November 30, 2014 6:29 PM
To: user@mahout.apache.org
Subject: Re: DBSCAN implementation in Mahout

On Sat, Nov 29, 2014 at 8:31 PM, 3316 Chirag Nagpal <
chiragnagpal_12...@aitpune.edu.in> wrote:

> Since Density based clustering algorithms, are being utilised extensively,
> especially by the GIS research groups, it is a bit sad that there isn't a
> Map Reduce implementation available..
>
> I think I will propose to write MapReduce code for DBSCAN and OPTICS for
> GSoC '15.
>
> I would like to take your input as to how much of significance would this
> be of to the community in general?
>

We have had proposals to add this to Mahout, but as far as I remember, no
credible requests to use it.

Also, there is the question of scalability of dbscan like algorithms.

Re: DBSCAN implementation in Mahout

Reply via email to