Thanks, Manish! Exactly what I was looking for.
On Fri, Jul 12, 2013 at 4:52 PM, Manish Amde wrote:
> Hi Sergey,
>
> There is a sample_weights option (not very well documented) in the random
> forest classifier that might help. You might want to check out the SVC
> example to see the sample_we
Peter,
I tried your suggestion. But my training error with sample weights is still
not the same as without sample weights. It seems like I am missing
something here. It doesn't seem to work for me.
Anne Dwyer
On Fri, Jul 12, 2013 at 5:19 PM, Peter Prettenhofer <
peter.prettenho...@gmail.com> wr
Hi Sergey,
There is a sample_weights option (not very well documented) in the random
forest classifier that might help. You might want to check out the SVC example
to see the sample_weights format.
http://scikit-learn.org/stable/auto_examples/svm/plot_weighted_samples.html
You can provide diffe
I'm dealing with a 50-class classification problem with extremely
unbalanced classes. The smallest class has about 1000 samples and the
largest has 500,000. The random forest I've trained is being heavily
skewed towards the big classes.
Is there a good way to deal with this kind of problem in sk
2013/7/12 Lars Buitinck :
> 2013/7/12 Antonio Manuel Macías Ojeda :
>> I'm not sure how are you using it but something to take into account is that
>> the default NLTK tokenizer is meant to be used on sentences, not on whole
>> paragraphs or documents, so it should operate on the output of a senten
try float(len(y_train)) - seems like C default is int...
Am 13.07.2013 00:10 schrieb "Anne Dwyer" :
> Peter,
>
> Thanks for your answers. When I scale C by len(y_train), I get the
> following error:
>
> ValueError: C <= 0
>
> Anne Dwyer
>
>
> On Fri, Jul 12, 2013 at 3:34 PM, Peter Prettenhofer <
>
Peter,
Thanks for your answers. When I scale C by len(y_train), I get the
following error:
ValueError: C <= 0
Anne Dwyer
On Fri, Jul 12, 2013 at 3:34 PM, Peter Prettenhofer <
peter.prettenho...@gmail.com> wrote:
> Hi Anne,
>
> I would also expect that using uniform weights should result in th
Yeah it's definitely not build with speed as it's design goal. Good patch!
On Fri, Jul 12, 2013 at 1:45 PM, Lars Buitinck wrote:
> 2013/7/12 Antonio Manuel Macías Ojeda :
> > I'm not sure how are you using it but something to take into account is
> that
> > the default NLTK tokenizer is meant t
2013/7/12 Antonio Manuel Macías Ojeda :
> I'm not sure how are you using it but something to take into account is that
> the default NLTK tokenizer is meant to be used on sentences, not on whole
> paragraphs or documents, so it should operate on the output of a sentence
> tokenizer not on the raw t
2013/7/12 Peter Prettenhofer
> Hi Anne,
>
> I would also expect that using uniform weights should result in the same
> solution as no weights -- but maybe there is an interaction with the C
> parameter... for this we would need to know more about the internals of
> libsvm and how it handles sampl
Hi Anne,
I would also expect that using uniform weights should result in the same
solution as no weights -- but maybe there is an interaction with the C
parameter... for this we would need to know more about the internals of
libsvm and how it handles sample weights - try scaling C by
``len(y_train
I have been using the sonar data set (I believe this is a sample data set
used in many demonstrations of machine learning.) It is a two class data
set with 60 features with 208 training examples.
I have a questions about using sample weights in fitting the SVM model.
When I fit the model using sc
Hi!
> I found that about 75% of
> the time was spent in MiniBatchKMeans.fit, while the rest of it was
> spent inside nltk.word_tokenize (!)
>
I'm not sure how are you using it but something to take into account is
that the default NLTK tokenizer is meant to be used on sentences, not on
whole par
On 12 July 2013 09:48, Lars Buitinck wrote:
> 2013/7/11 Tom Fawcett :
> [...]
>
> I guess because it's terribly slow. I recently tried to cluster a
> sample of Wikipedia text at the word level.
What kind of results did you get? I did some work recently clustering
short-form text and was general
2013/7/12 Lars Buitinck :
> 2013/7/11 Tom Fawcett :
>>> On Sun, Jul 7, 2013 at 6:58 AM, Joel Nothman
>>> wrote:
>>> (But I'm also not convinced that NLTK is the right tool for a lot of
>>> large-scale feature extraction jobs.)
>>
>> I’m curious – why?
>
> I guess because it's terribly slow. I re
2013/7/11 Tom Fawcett :
>> On Sun, Jul 7, 2013 at 6:58 AM, Joel Nothman
>> wrote:
>> (But I'm also not convinced that NLTK is the right tool for a lot of
>> large-scale feature extraction jobs.)
>
> I’m curious – why?
I guess because it's terribly slow. I recently tried to cluster a
sample of W
On Fri, Jul 12, 2013 at 09:06:03AM +0200, Andreas Mueller wrote:
> > Structured prediction in sklearn was one of the outcomes from the survey.
> > Would it be a better idea to send people to pystruct, rather than
> > implement it here?
> I think so.
I think so to.
> We decided that structured p
2013/7/12 Hakan :
> Unfortunately it's not pretty straight forward as you
> said...
The error message was:
TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]
It is completely straightforward. It says that the object you are
dealing with a sparse matrix as written in the docum
On Fri, 12 Jul 2013 17:59:29 +0200
Olivier Grisel wrote:
>> X_train=X_in
>> y_train=y_in
>> X_test=X_in
>> y_test=y_in
>
> This is a methodological mistake: you should never use
>the same data
> for training and testing a model. Instead use:
>
> from sklearn.cross_validation import train_test
Unfortunately it's not pretty straight forward as you
said... I have made the changes Mathieu and you mentioned
but loading the feature set into an array "X=X.toarray()"
doesn't respond immediately to run any example with libsvm
datasets.
Please have a look the following code...decision boundry
2013/7/12 Hueseyin Hakan Pekmezci :
> Hi scikit-learn members,
>
> 0.13.1 documentation states that individual datasets can
> be loaded in svmlight / libsvm format. So I have fed in
> "iris.scale" libSVM dataset however some erroneous
> behaviour happens. I am just trying to reproduce
> "plot_iris
On 07/12/2013 05:14 PM, Hakan wrote:
> as you see initially I was loading the iris data exactly
> like example. But being able to work for individual
> datasets, I needed to give it a libSVM try. Is there any
> piece of code, example to point out its smooth integration
> with scikit-learn? I mean s
as you see initially I was loading the iris data exactly
like example. But being able to work for individual
datasets, I needed to give it a libSVM try. Is there any
piece of code, example to point out its smooth integration
with scikit-learn? I mean some svm classifier example with
svmlight_l
Initally I have tried that one you mentioned but I toss
the barrier as following. Then I started to reconsider may
be there is a problem with libSVM reading...
Traceback (most recent call last):
File "linsvm.py", line 48, in
pl.scatter(X[:, 0], X[:, 1], c=y, zorder=10,
cmap=pl.cm.Paired
Hi.
If you just want the iris dataset, you can get it using
"datasets.load_iris()" (and scale it with StandardScaler).
The problem in your code is that load_svmlight_file returns X as a
sparse matrix.
You need to convert it to an nd-array if you want to use the example
using X.toarray().
(I thin
Well the error message says it all: you cannot use len on a sparse matrix.
Instead of len(X), use X.shape[0].
Mathieu
On Fri, Jul 12, 2013 at 11:35 PM, Hueseyin Hakan Pekmezci <
pekme...@rhrk.uni-kl.de> wrote:
> Hi scikit-learn members,
>
> 0.13.1 documentation states that individual datasets ca
Hi scikit-learn members,
0.13.1 documentation states that individual datasets can
be loaded in svmlight / libsvm format. So I have fed in
"iris.scale" libSVM dataset however some erroneous
behaviour happens. I am just trying to reproduce
"plot_iris_exercise.py" with
iris.scale(http://www.csi
skstruct?
In french it translates to "c'est quoi ce truc?" :)
--
Olivier
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate
I'm coming at this from a market research point of view (that's my
background). There seem to be a number of opportunities there for
classificaton, clustering, and regression analysis tools, so I am building
- or rather attempting to build - tools with the aim that they will go on
the web, and peo
On Fri, Jul 12, 2013 at 4:06 PM, Andreas Mueller
wrote:
>
> About naming it scikit-struct: is there any requirement to become a scikit?
> Also: is there much benefit - pandas seems to be doing quite well
> without the brand ;)
>
My suggestion was half a joke :). But I find it a little bit disappo
Hi Nigel. I see you're in the UK, I'm based east of you in London. My
goal with the disambiguator is to provide a well documented pipeline
such that it can be easily retrained.
I have a notion that in the future I'll host a version of my code
production-ready under my http://annotate.io/ , ready f
Hi Harold. Are you using different models for the different types of
social media? I'd guess that the grammar/terms used in a tweet could
look quite different to what you see in e.g. a Google+ Comment
(different demographic->probably higher quality English, less space
restrictions->longer/clearer w
On Fri, Jul 12, 2013 at 7:01 PM, Gilles Louppe wrote:
> Otherwise, on my part, I plan to complete PR #2131 if it is not yet
> merged in by the time of the sprint, and then address tree-related
> issues/PRs that have been lying around for months now. Also, if
> someone has a special request for th
2013/7/12 Andreas Mueller
> On 07/12/2013 01:26 AM, Robert Layton wrote:
> > Structured prediction in sklearn was one of the outcomes from the survey.
> > Would it be a better idea to send people to pystruct, rather than
> > implement it here?
> >
> I think so. We decided that structured predicti
> - discuss the with the tree growers guys on how to best parallelize
> random forest trainings on multi-core without copying the training set
> in memory
>- either with threads in joblib and "with nogil" statements in the
> inner loops of the (new) cython code
>- either with shared memory
2013/7/12 Andreas Mueller :
> On 07/12/2013 09:23 AM, Vlad Niculae wrote:
>> The requirements are definitely the blocking thing here. Not just the
>> dependency on cvxopt but also the inference packages and the fact they
>> need to be built manually. The api is sklearn-ish enough even with
>> list
On 07/12/2013 09:23 AM, Vlad Niculae wrote:
> The requirements are definitely the blocking thing here. Not just the
> dependency on cvxopt but also the inference packages and the fact they
> need to be built manually. The api is sklearn-ish enough even with
> lists-of-lists.
>
The API, yes, but th
The requirements are definitely the blocking thing here. Not just the
dependency on cvxopt but also the inference packages and the fact they
need to be built manually. The api is sklearn-ish enough even with
lists-of-lists.
On Fri, Jul 12, 2013 at 10:06 AM, Andreas Mueller
wrote:
> On 07/12/201
On 07/12/2013 01:26 AM, Robert Layton wrote:
> Structured prediction in sklearn was one of the outcomes from the survey.
> Would it be a better idea to send people to pystruct, rather than
> implement it here?
>
I think so. We decided that structured prediction was out of scope for
sklearn, right
39 matches
Mail list logo