Re: [scikit-learn] Agglomerative clustering

2018-12-09 Thread Gael Varoquaux
> I want to impose an additional constraint. When 2 clusters are combined and > the > cost of combination is equal for multiple cluster pairs, I want to choose the > pair for which the combined cluster has the least size. > What is the cleanest and easiest way of achieving this? I don't think th

[scikit-learn] Agglomerative clustering

2018-12-09 Thread Jitesh Khandelwal
Hi everyone, I am using agglomerative clustering with an L1 distance matrix as input and the "complete" linkage option. I want to impose an additional constraint. When 2 clusters are combined and the cost of combination is equal for multiple cluster pairs, I want to choose the pair for which the

Re: [scikit-learn] Agglomerative clustering problem

2017-07-15 Thread Jacob Schreiber
Typically when I think of limiting the number of points in a cluster I think of KD trees. I suppose that wouldn't work? On Tue, Jul 11, 2017 at 11:22 AM, Ariani A wrote: > ِDear Uri, > Thanks. I just have a pairwise distance matrix and I want to implement it > so that each cluster has at least 4

Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters

2017-07-13 Thread Ariani A
Dear Shane, Sorry bothering you! Is the "precomputed" and "distance matrix" you are talking about, are about "DBSCAN" ? Thanks, Best. On Thu, Jul 13, 2017 at 7:03 PM, Ariani A wrote: > Dear Shane, > Thanks for your prompt answer. > Do you mean that for DBSCAN there is no need to feed other param

Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters

2017-07-13 Thread Ariani A
Dear Shane, Thanks for your prompt answer. Do you mean that for DBSCAN there is no need to feed other parameters? Do I just call the function or I have to manipulate the code? P.S. I was not able to find the DBSCAN code on github. Looking forward to hearing from you. Best, -Noushin On Thu, Jul 13,

Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters

2017-07-13 Thread Shane Grigsby
Hi Ariani, Yes, you can use a distance matrix-- I think that what you want is metric='precomputed', and then X would be your N by N distance matrix. Hope that helps, ~Shane On 07/13, Ariani A wrote: Dear Shane, Thanks for your answer. Does DBSCAN works with distance matrix/? I have a distance

Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters

2017-07-13 Thread Ariani A
Dear Shane, Thanks for your answer. Does DBSCAN works with distance matrix/? I have a distance matrix (symmetric matrix which contains pairwise distances). Can you help me? I did not find DBSCAN code in that link. Best, -Ariani On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby wrote: > This sounds

Re: [scikit-learn] Agglomerative clustering problem

2017-07-11 Thread Ariani A
ِDear Uri, Thanks. I just have a pairwise distance matrix and I want to implement it so that each cluster has at least 40 data points. (in Agglomerative). Does it work? Thanks, -Ariani On Tue, Jul 11, 2017 at 1:54 PM, Uri Goren wrote: > Take a look at scipy's fcluster function. > If M is a matri

Re: [scikit-learn] Agglomerative clustering problem

2017-07-11 Thread Uri Goren
Take a look at scipy's fcluster function. If M is a matrix of all of your feature vectors, this code snippet should work. You need to figure out what metric and algorithm work for you from sklearn.metrics import pairwise_distance from scipy.cluster import hierarchy X = pairwise_dista

[scikit-learn] Agglomerative clustering problem

2017-07-11 Thread Ariani A
Hi all, I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to rela

Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters

2017-07-06 Thread Ariani A
Dear Shane, Thanks for your time. But I have to implement it by agglomerative clustering and cut it when each cluster has at least 40 data points. But I am not sure how to do cut it. I was guessing maybe it can be done by cutting the dandrogram? Is it correct? If so, I do not know how to apply it.

Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters

2017-07-06 Thread Shane Grigsby
This sounds like it may be a problem more amenable to either DBSCAN or OPTICS. Both algorithms don't require a priori knowledge of the number of clusters, and both let you specify a minimum point membership threshold for cluster membership. The OPTICS algorithm will also produce a dendrogram th

[scikit-learn] Agglomerative Clustering without knowing number of clusters

2017-06-30 Thread Ariani A
I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendr

Re: [scikit-learn] Agglomerative clustering

2017-06-30 Thread Olivier Grisel
You can have a look at the test named "test_agglomerative_clustering" in: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/tests/test_hierarchical.py -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.

[scikit-learn] Agglomerative clustering

2017-06-29 Thread Ariani A
I have some data and also the pairwise distance matrix of these data points. I want to cluster them using Agglomerative clustering. I readthat in sklearn, we can have 'precomputed' as affinity and I expect it is the distance matrix. But I could not find any example which uses precomputed affinity a