Re: [Scikit-learn-general] Text document clustering: How can I access the actual clustered documents

2013-01-31 Thread Andreas Mueller
You might be interested in my comment here: https://github.com/scikit-learn/scikit-learn/pull/1471#issuecomment-12177094 where I visualized the cluster centers -- Everyone hates slow websites. So do we. Make your web apps

Re: [Scikit-learn-general] Text document clustering: How can I access the actual clustered documents

2013-01-31 Thread Fred Mailhot
Given a fitted KMeans named "km", and a numpy array of documents, to get a list of documents associated with cluster i: documents[np.where(km.labels_ == i)] Not sure what you mean by "a list of cluster terms", though (a list of all terms from all docs associated with a given cluster?)... On 31

Re: [Scikit-learn-general] Text document clustering: How can I access the actual clustered documents

2013-01-31 Thread Lars Buitinck
2013/2/1 Robert Layton : > They are ordered, so you can get the cluster number of the ith document > using: > > model.labels_[i] ... and all indices of documents in the cluster k with np.where(models.labels_ == k) > And the corresponding document is simply: > > documents[i] ... where docume

Re: [Scikit-learn-general] Why does GaussianNB.fit() accept only sparse input?

2013-01-31 Thread Lars Buitinck
2013/1/31 Willi Richert : > the changeset > https://github.com/scikit-learn/scikit-learn/commit/a12bd7bc7cc9c02fec82a49376204393c45818a5, > made GaussianNB (and some other classifiers) to accept only dense input. That's not true. Sparse matrices were never accepted. The commit just improved the er

Re: [Scikit-learn-general] Text document clustering: How can I access the actual clustered documents

2013-01-31 Thread Robert Layton
On 1 February 2013 10:20, Vinay B, wrote: > Another newbie question. > > I'm not referring to a confusion matrix or similar summary. Rather, If > I had a number of documents clustered using (say KMeans) into 3 > clusters, .. how could I access > 1. each cluster and a list of cluster terms? > 2. a

[Scikit-learn-general] Text document clustering: How can I access the actual clustered documents

2013-01-31 Thread Vinay B,
Another newbie question. I'm not referring to a confusion matrix or similar summary. Rather, If I had a number of documents clustered using (say KMeans) into 3 clusters, .. how could I access 1. each cluster and a list of cluster terms? 2. a list of documents associated with each cluster? Thanks

[Scikit-learn-general] Why does GaussianNB.fit() accept only sparse input?

2013-01-31 Thread Willi Richert
Hi, the changeset https://github.com/scikit-learn/scikit-learn/commit/a12bd7bc7cc9c02fec82a49376204393c45818a5, made GaussianNB (and some other classifiers) to accept only dense input. That means that the following code results into an exception: tfidf = TfidfVectorizer() clf = GaussianN

[Scikit-learn-general] Text data training: UnicodeDecodeError

2013-01-31 Thread Vinay B,
Hi, I'm new to Scikit-learn and python (though not to programming) and am working my way through the examples. Aim: Train a model based on textual data and use the trained model to classify individual text files. Issue: I end up with Unicode errors : UnicodeDecodeError: 'utf8' codec can't decode by

Re: [Scikit-learn-general] K means on a sphere

2013-01-31 Thread Ariel Rokem
Hey Wei, On Mon, Jan 28, 2013 at 5:33 AM, Wei LI wrote: > Hi Ariel: > > There is one matlab implementation for spherical kmeans: > http://www.mathworks.com/matlabcentral/fileexchange/28902-spherical-k-means > for > spherical kmeans and you can have a look at it :) That seems quite simple > and

Re: [Scikit-learn-general] K means on a sphere

2013-01-31 Thread Ariel Rokem
Hi Denis, On Thu, Jan 31, 2013 at 8:48 AM, denis wrote: > Ariel, > what's k, how many data points do you have ? > I have something between approximately 2 and 150 data points. What's k? I guess that's my next question, right? For now, I am trying to use an AIC criterion to determine how hig

Re: [Scikit-learn-general] K means on a sphere

2013-01-31 Thread denis
Ariel, what's k, how many data points do you have ? There's a trivial k-means under http://stackoverflow.com/questions/5529625/is-it-possible-to-specify-your-own-distance-function-using-scikits-learn-k-means "verbose=1" gives you some idea of how clusters converge (or not). (If you have even a

Re: [Scikit-learn-general] spectral clustering with discretize option

2013-01-31 Thread Olivier Grisel
2013/1/31 Gael Varoquaux : > On Thu, Jan 31, 2013 at 08:17:02AM -0500, Satrajit Ghosh wrote: >> the problem that this causes is in comparing clusters from two similar >> datasets. are the clusters different because they reflect different >> properties or simply differences in local minima. > > Beca

Re: [Scikit-learn-general] spectral clustering with discretize option

2013-01-31 Thread Gael Varoquaux
On Thu, Jan 31, 2013 at 08:17:02AM -0500, Satrajit Ghosh wrote: > the problem that this causes is in comparing clusters from two similar > datasets. are the clusters different because they reflect different > properties or simply differences in local minima. Because clustering is so much of an ill

Re: [Scikit-learn-general] spectral clustering with discretize option

2013-01-31 Thread Satrajit Ghosh
On Thu, Jan 31, 2013 at 3:30 AM, Alexandre Gramfort < alexandre.gramf...@m4x.org> wrote: > > That's actually probably a very good idea. If it turns out to give good > results in practice, we should have an option for this. > > +1 > thanks guys. that's exactly what i'm trying to do now. the key is

[Scikit-learn-general] Idea: Text examples on the documentation

2013-01-31 Thread Gael Varoquaux
Hi list, Based on a couple of remarks on issues recently, it seems that we need more text-mining examples. I just had a crazy idea: we could use as a dataset our own documentation. We have 35k lines in rst files, and 120k words. That's somewhat a decent corpus. I don't know anything about text p

Re: [Scikit-learn-general] spectral clustering with discretize option

2013-01-31 Thread Alexandre Gramfort
> That's actually probably a very good idea. If it turns out to give good > results in practice, we should have an option for this. +1 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics

Re: [Scikit-learn-general] spectral clustering with discretize option

2013-01-31 Thread Olivier Grisel
2013/1/31 Satrajit Ghosh : > hi all, > > i have been playing with the discretize option in spectral clustering and it > seems to be quite sensitive to the random state. > > here is demo with the sklearn demo (just iterating over 4 random states): > > https://dl.dropbox.com/u/363467/test_discretize.

Re: [Scikit-learn-general] spectral clustering with discretize option

2013-01-31 Thread Gael Varoquaux
- Original message - > I am not sure whether it can be randomly initialized many times and pick > the best just like in k-means? That's actually probably a very good idea. If it turns out to give good results in practice, we should have an option for this. Gael