[scikit-learn] DBSCAN

2018-08-09 Thread Prathusha Jonnagaddla Subramanyam Naidu
Hi everyone, I'm trying to cluster 14000 samples using DBSCAN and want to know if there is a way to display the index of each data point along with it's label. I'm only able to access labels in the form of a list . When I look at the graph and see outliers (black points) , I'm not able to pin

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-28 Thread Mauricio Reis
I decreased the sampling interval to reduce the base size from 40,000 to 10,000 so that I could then use the DBScan routine. Now another problem has arisen: I want to analyze the "Noisy Samples" points and I need to calculate the distance to the nearest cluster, ie (a) the distance to the nearest

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-18 Thread Shane Grigsby
Hi Mauricio, You can also use OPTICS in DBSCAN mode. The pull request is here if you'd like to clone it: https://github.com/scikit-learn/scikit-learn/pull/1984 Running ~40,000 points in three dimensions takes about a minute. See the example page here for how to do the DBSCAN extraction: htt

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-17 Thread Joel Nothman
There are two issues here: 1. We store all radius neighborhoods of all points in memory at once. This is a problem if each point has a large radius neighborhood. DBSCAN only requires that you store the radius neighbors of the point you are currently examining. We could provide a memory-efficient m

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-17 Thread Mauricio Reis
I'm not used to the terms used here. So I understood that the package had memory management, which was removed. But you could make the code available with memory management implementations. Is it?! :-) The problem is that I do not know what I would do with the code, because I only know how to work

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-16 Thread Joel Nothman
Implemented in a previous version of #10280 , but removed for now to simplify reviews . If others would like to review #10280, I'm happy to follow up with the

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-16 Thread Gael Varoquaux
On Wed, May 16, 2018 at 01:44:17PM -0400, Andreas Mueller wrote: > Should we have "low memory"/batched version of k_neighbors_graph and > epsilon_neighbors_graph functions? I assume > those instantiate the dense matrix right now. +1! It shouldn't be too hard to do. G

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-16 Thread Andreas Mueller
Should we have "low memory"/batched version of k_neighbors_graph and epsilon_neighbors_graph functions? I assume those instantiate the dense matrix right now. On 05/13/2018 10:59 PM, Joel Nothman wrote: This is quite a common issue with our implementation of DBSCAN, and improvements to documen

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-16 Thread Andreas Mueller
You might also consider looking at hdbscan: https://github.com/scikit-learn-contrib/hdbscan On 05/13/2018 11:07 PM, Joel Nothman wrote: Note that this has long been documented under "Memory consumption for large sample sizes" at http://scikit-learn.org/stable/modules/clustering.html#dbscan

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-13 Thread Joel Nothman
Note that this has long been documented under "Memory consumption for large sample sizes" at http://scikit-learn.org/stable/modules/clustering.html#dbscan On 14 May 2018 at 12:59, Joel Nothman wrote: > This is quite a common issue with our implementation of DBSCAN, and > improvements to document

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-13 Thread Joel Nothman
This is quite a common issue with our implementation of DBSCAN, and improvements to documentation would be very, very welcome. The high memory cost comes from constructing the pairwise radius neighbors for all points. If using a distance metric that cannot be indexed with a KD-tree or Ball Tree, t

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-13 Thread Sebastian Raschka
> So I suggest that there is a test version that shows a proper message when an > error occurs. I think the freezing that happens in your case is operating system specific and it would require some weird workarounds to detect at which RAM usage the combination of machine and operating system mi

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-13 Thread José María Mateos
On Sun, May 13, 2018 at 08:23:15PM -0300, Mauricio Reis wrote: > To summarize: 1) How to check the memory of the computer during the > execution of the routine? 2) I suggest developing test versions of routines > that may have a memory error. If you are on Linux, can you just run "top" while your

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-13 Thread Mauricio Reis
I think the problem is due to the size of my database, which has 44,000 records. When I ran a database test with reduced sizes (10,000 and 20,000 first records), the routine ran normally. You ask me to check the memory while running the DBScan routine, but I do not know how to do that (if I did, I

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-13 Thread Roman Yurchak
Could you please check memory usage while running DBSCAN to make sure freezing is due to running out of memory and not to something else? Which parameters do you run DBSCAN with? Changing algorithm, leaf_size parameters and ensuring n_jobs=1 could help. Assuming eps is reasonable, I think it sh

Re: [scikit-learn] DBScan freezes my computer !!!

2018-05-12 Thread Andrew Nystrom
If you’re l2 norming your data, you’re making it live on the surface of a hypershere. That surface will have a high density of points and may not have areas of low density, in which case the entire surface could be recognized as a single cluster if epsilon is high enough and min neighbors is low en

[scikit-learn] DBScan freezes my computer !!!

2018-05-12 Thread Mauricio Reis
The DBScan "fit" method (in scikit-learn v0.19.1) is freezing my computer without any warning message! I am using WinPython 3.6.5 64 bit. The method works normally with the original data, but freezes when I use the normalized data (between 0 and 1). What should I do? Att., Mauricio Reis ___

Re: [scikit-learn] DBSCAN Border Points

2018-01-30 Thread Joel Nothman
It includes non-core points, but not points that are out of eps from any core point. You can modify eps and min_samples. But perhaps you should just choose a different clustering algorithm if this is behaviour you absolutely do not want. On 30 January 2018 at 23:24, AMIR SHANEHSAZZADEH < amir.p.sh

[scikit-learn] DBSCAN Border Points

2018-01-30 Thread AMIR SHANEHSAZZADEH
Hello, I am working with the latest implementation of DBSCAN. I believe that scikit-learn's implementation does not include non-core points in clusters. This results in border points not being included in clusters. Is there any way to remedy this issue so that border points are included in their r