You might be interested in my comment here:
https://github.com/scikit-learn/scikit-learn/pull/1471#issuecomment-12177094
where I visualized the cluster centers
--
Everyone hates slow websites. So do we.
Make your web apps
Given a fitted KMeans named "km", and a numpy array of documents, to get a
list of documents associated with cluster i:
documents[np.where(km.labels_ == i)]
Not sure what you mean by "a list of cluster terms", though (a list of all
terms from all docs associated with a given cluster?)...
On 31
2013/2/1 Robert Layton :
> They are ordered, so you can get the cluster number of the ith document
> using:
>
> model.labels_[i]
... and all indices of documents in the cluster k with
np.where(models.labels_ == k)
> And the corresponding document is simply:
>
> documents[i]
... where docume
2013/1/31 Willi Richert :
> the changeset
> https://github.com/scikit-learn/scikit-learn/commit/a12bd7bc7cc9c02fec82a49376204393c45818a5,
> made GaussianNB (and some other classifiers) to accept only dense input.
That's not true. Sparse matrices were never accepted. The commit just
improved the er
On 1 February 2013 10:20, Vinay B, wrote:
> Another newbie question.
>
> I'm not referring to a confusion matrix or similar summary. Rather, If
> I had a number of documents clustered using (say KMeans) into 3
> clusters, .. how could I access
> 1. each cluster and a list of cluster terms?
> 2. a
Another newbie question.
I'm not referring to a confusion matrix or similar summary. Rather, If
I had a number of documents clustered using (say KMeans) into 3
clusters, .. how could I access
1. each cluster and a list of cluster terms?
2. a list of documents associated with each cluster?
Thanks
Hi,
the changeset
https://github.com/scikit-learn/scikit-learn/commit/a12bd7bc7cc9c02fec82a49376204393c45818a5,
made GaussianNB (and some other classifiers) to accept only dense input.
That means that the following code results into an exception:
tfidf = TfidfVectorizer()
clf = GaussianN
Hi,
I'm new to Scikit-learn and python (though not to programming) and am
working my way through the examples.
Aim: Train a model based on textual data and use the trained model to
classify individual text files.
Issue: I end up with Unicode errors : UnicodeDecodeError: 'utf8' codec
can't decode by
Hey Wei,
On Mon, Jan 28, 2013 at 5:33 AM, Wei LI wrote:
> Hi Ariel:
>
> There is one matlab implementation for spherical kmeans:
> http://www.mathworks.com/matlabcentral/fileexchange/28902-spherical-k-means
> for
> spherical kmeans and you can have a look at it :) That seems quite simple
> and
Hi Denis,
On Thu, Jan 31, 2013 at 8:48 AM, denis wrote:
> Ariel,
> what's k, how many data points do you have ?
>
I have something between approximately 2 and 150 data points.
What's k? I guess that's my next question, right? For now, I am trying to
use an AIC criterion to determine how hig
Ariel,
what's k, how many data points do you have ?
There's a trivial k-means under
http://stackoverflow.com/questions/5529625/is-it-possible-to-specify-your-own-distance-function-using-scikits-learn-k-means
"verbose=1" gives you some idea of how clusters converge (or not).
(If you have even a
2013/1/31 Gael Varoquaux :
> On Thu, Jan 31, 2013 at 08:17:02AM -0500, Satrajit Ghosh wrote:
>> the problem that this causes is in comparing clusters from two similar
>> datasets. are the clusters different because they reflect different
>> properties or simply differences in local minima.
>
> Beca
On Thu, Jan 31, 2013 at 08:17:02AM -0500, Satrajit Ghosh wrote:
> the problem that this causes is in comparing clusters from two similar
> datasets. are the clusters different because they reflect different
> properties or simply differences in local minima.
Because clustering is so much of an ill
On Thu, Jan 31, 2013 at 3:30 AM, Alexandre Gramfort <
alexandre.gramf...@m4x.org> wrote:
> > That's actually probably a very good idea. If it turns out to give good
> results in practice, we should have an option for this.
>
> +1
>
thanks guys. that's exactly what i'm trying to do now. the key is
Hi list,
Based on a couple of remarks on issues recently, it seems that we need
more text-mining examples.
I just had a crazy idea: we could use as a dataset our own documentation.
We have 35k lines in rst files, and 120k words. That's somewhat a decent
corpus.
I don't know anything about text p
> That's actually probably a very good idea. If it turns out to give good
> results in practice, we should have an option for this.
+1
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
2013/1/31 Satrajit Ghosh :
> hi all,
>
> i have been playing with the discretize option in spectral clustering and it
> seems to be quite sensitive to the random state.
>
> here is demo with the sklearn demo (just iterating over 4 random states):
>
> https://dl.dropbox.com/u/363467/test_discretize.
- Original message -
> I am not sure whether it can be randomly initialized many times and pick
> the best just like in k-means?
That's actually probably a very good idea. If it turns out to give good results
in practice, we should have an option for this.
Gael
18 matches
Mail list logo