Re: [scikit-learn] Should we standardize data before PCA?

2018-05-26 Thread Shiheng Duan
> are using]) > > Does this help? > Michael > > On Thu, May 24, 2018 at 4:39 PM, Shiheng Duan wrote: > >> Hello all, >> >> I wonder is it necessary or correct to do z score transformation before >> PCA? I didn't see any preprocessing for face image in t

[scikit-learn] Should we standardize data before PCA?

2018-05-24 Thread Shiheng Duan
Hello all, I wonder is it necessary or correct to do z score transformation before PCA? I didn't see any preprocessing for face image in the example of Faces recognition example using eigenfaces and SVMs, link: http://scikit-learn.org/stable/auto_examples/applications/plot_face_recognition.html#sp

Re: [scikit-learn] KMeans cluster

2018-02-20 Thread Shiheng Duan
Yes, but what is used to decide the optimal output? I saw on the document, it is the best output in terms of inertia. What does that mean? Thanks. On Wed, Feb 14, 2018 at 7:46 PM, Joel Nothman wrote: > you can repeatedly use n_init=1? > > ___ > scikit-

[scikit-learn] KMeans cluster

2018-02-14 Thread Shiheng Duan
Hello all, In KMeans cluster, there is a parameter n_init. It shows that the algorithm will run n_init times and output the best. I wonder how to compare the output of each run. Can we get the score for each run? Thanks. ___ scikit-learn mailing list scik

Re: [scikit-learn] clustering on big dataset

2018-01-04 Thread Shiheng Duan
ve on having another, approximating, parameter. You do > not need to set n_clusters to a fixed value for BIRCH. You only need to > provide another clusterer, which has its own parameters, although you > should be able to experiment with different "global clusterers". > > On

Re: [scikit-learn] clustering on big dataset

2018-01-03 Thread Shiheng Duan
Yes, it is an efficient method, still, we need to specify the number of clusters or the threshold. Is there another way to run hierarchy clustering on the big dataset? The main problem is the distance matrix. Thanks. On Tue, Jan 2, 2018 at 6:02 AM, Olivier Grisel wrote: > Have you had a look at

[scikit-learn] clustering on big dataset

2018-01-01 Thread Shiheng Duan
Hi all, I wonder if there is any method to do exact clustering (hierarchy cluster) on a huge dataset where it is impossible to use distance matrix. I am considering KD-tree but every time it needs to rebuild it, consuming lots time. Any ideas? ___ scikit-

Re: [scikit-learn] Issue with Sihouette_samples

2017-11-16 Thread Shiheng Duan
s, so probably your RAM is small. How much RAM has your pc? > Let me know, > > Luigi > > > Il giorno 16 nov 2017, alle ore 09:18, Shiheng Duan > ha scritto: > > Hi all, > > I am doing cluster work and wanna use silhouette score to determine the > number of

[scikit-learn] Issue with Sihouette_samples

2017-11-16 Thread Shiheng Duan
Hi all, I am doing cluster work and wanna use silhouette score to determine the number of clusters. But I got MemoryError when execute silhouette_samples. I searched it and found something related to numpy. But I cannot reproduce the numpy error. Is there any solution to it? The data is 621*1405*