[Scikit-learn-general] issue with custom regressor in the pipeline

2015-05-19 Thread Pagliari, Roberto
I'm trying to add a custom regressor to a pipeline. For debugging purposes I commented everything out. class myRegressor(BaseEstimator, TransformerMixin): def __init__(self, k=0, njobs=1, cv=6, nestimators=50): pass def fit(self, X, y=None): return self def transform(

Re: [Scikit-learn-general] error when running random forest fit

2015-05-05 Thread Pagliari, Roberto
Sent: Tuesday, May 05, 2015 2:35 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] error when running random forest fit Can you provide your code? And what is n_jobs in this line? And mp.cpu_count()? On 05/05/2015 02:29 PM, Pagliari, Roberto wrote: I'm getti

[Scikit-learn-general] error when running random forest fit

2015-05-05 Thread Pagliari, Roberto
I'm getting this error (for the first time) with random forest. I'm not sure what that means. It seems to appear regardless of the number of jobs I'm setting. clf.fit(X=np.array(X.values, dtype=float), y=np.ravel(y.values)) File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/fo

Re: [Scikit-learn-general] class label hashing

2015-05-01 Thread Pagliari, Roberto
this information is entirely absent. Michael On Fri, May 1, 2015 at 5:07 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: Hi Sebastian, if classes/labels are the same for both training and test, that should not be a problem. I've done that and never seen any issues. As

Re: [Scikit-learn-general] class label hashing

2015-05-01 Thread Pagliari, Roberto
s. > On Apr 30, 2015, at 11:02 PM, Pagliari, Roberto > wrote: > > Suppose I train a classifier with dataset1, which contains labels > > 0 > 3 > 4 > 6 > 7 > > and then predict over dataset2 with labels > > 0 > 3 > 4 > 8 > 10 > > will the hashi

[Scikit-learn-general] class label hashing

2015-04-30 Thread Pagliari, Roberto
Suppose I train a classifier with dataset1, which contains labels 0 3 4 6 7 and then predict over dataset2 with labels 0 3 4 8 10 will the hashing be the same for labels 0, 3 and 4? and will scikit learn get confused by seeing new labels such as 8 and 10? Thank you, -

Re: [Scikit-learn-general] passing parameters to a transformer

2015-04-30 Thread Pagliari, Roberto
learn-general] passing parameters to a transformer You need to give more context. Can you give a minimal example that runs but breaks? On 04/29/2015 02:44 PM, Pagliari, Roberto wrote: I'm not sure why but when I do something like the below, nestimators becomes zero, despite the default val

[Scikit-learn-general] passing parameters to a transformer

2015-04-29 Thread Pagliari, Roberto
I'm not sure why but when I do something like the below, nestimators becomes zero, despite the default value of 2000. class myCustomTransformer(BaseEstimator, TransformerMixin): def __init__(self, k=5, nestimators=2000): self.k_ = k self.clf_ = RandomForestClassifier(n_estima

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Pagliari, Roberto
On Tue, Apr 28, 2015 at 10:53 PM, Sebastian Raschka mailto:se.rasc...@gmail.com>> wrote: Yes, PCA would work too, but then you'll get feature extraction instead of feature selection :) On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wro

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Pagliari, Roberto
: Re: [Scikit-learn-general] SVM for feature selection Yes, PCA would work too, but then you'll get feature extraction instead of feature selection :) On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: Hi Sebastian, thanks for the hint. I think ano

Re: [Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Pagliari, Roberto
that RFECV<http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html> only supports models that have coef_ ​ ​attribute, and GradientBoostingClassifier does not. On Tue, Apr 28, 2015 at 8:44 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: I'm trying to

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Pagliari, Roberto
grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=1, cv=StratifiedKFold(y, n_folds=10), scoring='accuracy', n_jobs=1) grid_search.fit(X, y) print(grid_search.b

[Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Pagliari, Roberto
I'm trying to use recursive feature elimination with gradient boosting and grid search as shown below gbr = GradientBoostingClassifier() parameters = {'learning_rate': [0.1, 0.01, 0.001], 'max_depth': [1, 4, 6], 'min_samples_leaf': [3, 5, 9, 17],

[Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Pagliari, Roberto
>From the documentation: "Feature selection is usually used as a pre-processing step before doing the actual learning. The recommended way to do this in scikit-learn is to use a sklearn.pipeline.Pipeline

[Scikit-learn-general] recursive feature elimination

2015-04-22 Thread Pagliari, Roberto
is it possible to pass a gridsearchCV object to RFE, as opposed to a simple estimator? Thank you, -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 sta

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
> scipy.stats as well, if more appropriate. > > Vlad > > >> On 20 Apr 2015, at 15:16, Pagliari, Roberto wrote: >> >> Yes, I agree. From the example, though, my understanding is that you can >> only pass arrays, not functions, isn't that true? >>

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
neral@lists.sourceforge.net Subject: Re: [Scikit-learn-general] randomized grid search If you have continuous parameter you should really really really use continuous distributions! On 04/20/2015 12:58 PM, Pagliari, Roberto wrote: > Hi Vlad, > when using randomized grid search, does sklearn look into i

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
Hi Vlad, when using randomized grid search, does sklearn look into intermediate values, or does it samples from the values provided in the parameter grid? Thank you, From: Vlad Niculae [zephy...@gmail.com] Sent: Monday, April 20, 2015 12:50 PM To: scikit

[Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
>From the example in the documentation: # specify parameters and distributions to sample from param_dist = {"max_depth": [3, None], "max_features": sp_randint(1, 11), "min_samples_split": sp_randint(1, 11), "min_samples_leaf": sp_randint(1, 11),

Re: [Scikit-learn-general] gradient boost classifier - feature_importances_

2015-04-16 Thread Pagliari, Roberto
never mind my question. I forgot gridsearch was the actual object. Thanks, From: Pagliari, Roberto [rpagli...@appcomsci.com] Sent: Thursday, April 16, 2015 12:50 PM To: scikit-learn-general@lists.sourceforge.net Subject: [Scikit-learn-general] gradient boost

[Scikit-learn-general] gradient boost classifier - feature_importances_

2015-04-16 Thread Pagliari, Roberto
is feature_importances_ available from gradient boosting? it is mentioned in the documentation, but it doesn't exist when I try to access it (after fitting via grid search). I printed 'dir' of the object and can't see it. Thanks, -

Re: [Scikit-learn-general] adaboost parameters

2015-04-14 Thread Pagliari, Roberto
5 03:45 AM, Pagliari, Roberto wrote: Right now I’m using the default values, which means decision tree as the estimator and learning rate 1.0. I should probably change the learning rate, at the very least, because I’m not getting good performance. Does it make sense to use random forest, instead o

[Scikit-learn-general] pydata

2015-04-14 Thread Pagliari, Roberto
Is there a pydata or sklearn workshop coming up in NYC or London? Thank you, -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process mod

Re: [Scikit-learn-general] adaboost parameters

2015-04-12 Thread Pagliari, Roberto
small-ish learning rate in order to get the best results, with the limiting factor (as always) being your computational and time budgets, respectively. My 2 cents. :D -Jason From: Pagliari, Roberto [mailto:rpagli...@appcomsci.com] Sent: Friday, April 10, 2015 1:18 PM To: scikit-learn-general@l

[Scikit-learn-general] adaboost parameters

2015-04-10 Thread Pagliari, Roberto
When using adaboost, what is a range of values of n_estimators and learning rate that makes sense to optimize over? Thank you, -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in a

[Scikit-learn-general] parallellize RFECV

2015-04-07 Thread Pagliari, Roberto
is there a simple way to parallellize recursive feature elimination? with gridsearcvCV you can set n_jobs, but there is no such parameter in RFECV.. Thank you, -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM

Re: [Scikit-learn-general] CV with SVM

2015-04-07 Thread Pagliari, Roberto
you want that? On 04/07/2015 12:24 PM, Pagliari, Roberto wrote: not all combinations of cost/loss functions and dual are possible with SVM. when performing grid search with CV, does sklearn skip invalid combinati

[Scikit-learn-general] CV with SVM

2015-04-07 Thread Pagliari, Roberto
not all combinations of cost/loss functions and dual are possible with SVM. when performing grid search with CV, does sklearn skip invalid combinations? Thank you, -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/

[Scikit-learn-general] random forests - number of samples

2015-03-11 Thread Pagliari, Roberto
How many samples does a single tree of a random use? Or does it use all samples? -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media,

Re: [Scikit-learn-general] grid search random state

2015-03-09 Thread Pagliari, Roberto
Hi, I'm not sure how to provide the StratifiedKFold parameter to gridsearchCV. Should it be part of the pipeline? Thank you, From: Pagliari, Roberto [mailto:rpagli...@appcomsci.com] Sent: Wednesday, February 25, 2015 8:17 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [S

Re: [Scikit-learn-general] SVM: Matlab vs skleanr

2015-03-03 Thread Pagliari, Roberto
vs skleanr Can't say about Matlab, but sklearn does SVM (unless it's LinearSVC) using libsvm internally (with minor tweaks on top), so you should expect the same results. On Tue, Mar 3, 2015 at 11:53 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: Has anybody ever

[Scikit-learn-general] SVM: Matlab vs skleanr

2015-03-03 Thread Pagliari, Roberto
Has anybody ever compared Matlab SVM vs sklearn or libsvm? It'd be interested to know about the difference in accuracy between them (using the same dataset and similar settings). Thank you, -- Dive into the World of Par

[Scikit-learn-general] tolerance and cache_size

2015-03-03 Thread Pagliari, Roberto
I'm using datasets from UCI, hence, fairly small. 1. Should I change cache_size? I'm not sure how to relate this to the dataset size. 2. How should I set the tolerance of SVM? With rbf SVM I notice that sometimes the error increases if the tolerance is too small, for example, 1e-7,

Re: [Scikit-learn-general] random forests with njobs>1

2015-02-27 Thread Pagliari, Roberto
lib installed? Does n_jobs > 1 work with other algorithms? On Sat, Feb 28, 2015 at 12:55 AM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: When using random forests with njobs > 1, I see one python process only. Does RF support using multip

[Scikit-learn-general] random forests with njobs>1

2015-02-27 Thread Pagliari, Roberto
When using random forests with njobs > 1, I see one python process only. Does RF support using multiprocessor module? -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and develope

Re: [Scikit-learn-general] how does sklearn apply pipelines

2015-02-26 Thread Pagliari, Roberto
, Sebastian Raschka mailto:se.rasc...@gmail.com>> wrote: It's actually quite simple: It invokes fit_transform on all elements in a pipeline but the last. On the last element in the pipeline (the estimator) only fit is invoked. Best, Sebastian > On Feb 26, 2015, at 9:01 PM, Pag

[Scikit-learn-general] how does sklearn apply pipelines

2015-02-26 Thread Pagliari, Roberto
Given a pipeline with a certain number of transformers and a classifier, how does sklearn know which method should be invoked from one step to another? Does it list the available methods for each object? -- Dive into the

Re: [Scikit-learn-general] grid search random state

2015-02-25 Thread Pagliari, Roberto
Thank you! From: Andy [mailto:t3k...@gmail.com] Sent: Wednesday, February 25, 2015 3:24 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] grid search random state On 02/24/2015 08:26 PM, Pagliari, Roberto wrote: I have two questions about gridsearchcv 1

[Scikit-learn-general] grid search random state

2015-02-25 Thread Pagliari, Roberto
I have two questions about gridsearchcv 1. Is it possible to fix the random state of the underlying kfold, for testing purposes? 2. When passing parameters, such as C and gamma for svm, does grid search go through them in order? Thank you,

[Scikit-learn-general] grid search random state

2015-02-25 Thread Pagliari, Roberto
I have two questions about gridsearchcv 1. Is it possible to fix the random state of the underlying kfold, for testing purposes? 2. When passing parameters, such as C and gamma for svm, does grid search go through them in order? Thank you,

Re: [Scikit-learn-general] custom scorer with parameters

2015-02-19 Thread Pagliari, Roberto
ant a tradeoff of complexity and accuracy. You do have access to the estimator in the scoring. Or you could do it manually after doing the grid-search (with refit=False). Really, ties shouldn't happen in practice, though. How large is your dataset? On 02/19/2015 02:23 PM, Pagliari, Rob

Re: [Scikit-learn-general] custom scorer with parameters

2015-02-19 Thread Pagliari, Roberto
eters Not really. The scorer should not really know that it is inside a grid-search. What is it you want to do? On 02/19/2015 10:18 AM, Pagliari, Roberto wrote: I'd like to implement a custom score function for grid search CV. would it be possible to get the current best parameters from within

[Scikit-learn-general] custom scorer with parameters

2015-02-19 Thread Pagliari, Roberto
I'd like to implement a custom score function for grid search CV. would it be possible to get the current best parameters from within the score function? -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Serve

Re: [Scikit-learn-general] same cross validation score with different parameter configurations

2015-02-18 Thread Pagliari, Roberto
with different parameter configurations Use the source, Luke https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/grid_search.py#L540 M. On Thu, Feb 19, 2015 at 7:24 AM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: When different parameter configurations produce th

[Scikit-learn-general] same cross validation score with different parameter configurations

2015-02-18 Thread Pagliari, Roberto
When different parameter configurations produce the same CV score, how does sklearn select the best parameters (I'm mostly interested about rbf SVM)? -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server fro

[Scikit-learn-general] intermediate results in pipeline

2015-02-17 Thread Pagliari, Roberto
When using the predict function clf.predict(x_test) where clf is obtained by gridsearchcv with a pipeline, is it possible to print intermediate results for debugging purposes? For example, if the pipeline is [scaling, transformer1, transformer2, classificatory] I would like to see the output of

Re: [Scikit-learn-general] custom regressor keeps failing

2015-02-17 Thread Pagliari, Roberto
, 2015 7:08 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] custom regressor keeps failing On 02/17/2015 03:50 PM, Pagliari, Roberto wrote: > I see. But in my case I have things like > > for i in xrange(0, num_experiments): > estimators = # s

Re: [Scikit-learn-general] custom regressor keeps failing

2015-02-17 Thread Pagliari, Roberto
I see. But in my case I have things like for i in xrange(0, num_experiments): estimators = # some pipeline etc... so estimators should be "re-created" everytime. In any case, I recently changed my code and I am no longer using lists. With scalar numbers I think it is working and I se

Re: [Scikit-learn-general] random splitting and random seed

2015-02-17 Thread Pagliari, Roberto
cross-validation, for example. Also, if you want a deterministic but varying way to set the random seed, why not just use a range? On 02/16/2015 10:25 PM, Pagliari, Roberto wrote: I'm comparing a few algorithms, and trying to have them run using the same random datasets. Each algorithm is

[Scikit-learn-general] random splitting and random seed

2015-02-16 Thread Pagliari, Roberto
I'm comparing a few algorithms, and trying to have them run using the same random datasets. Each algorithm is a separate python process and I provide a file with a list of integers, generated using numpy.random.randint. It is a small sequence of random integers between 0 and 10,000,000. Every

Re: [Scikit-learn-general] which methods do I need to implement for a regressor?

2015-02-16 Thread Pagliari, Roberto
able/developers/index.html#parameters-and-init > On 16 Feb 2015, at 12:52, Pagliari, Roberto wrote: > > I looked into some examples I found online but I’m a bit confused. > > Supposed I want to implement my own transformer, something similar to the > standard scaler. Would t

Re: [Scikit-learn-general] which methods do I need to implement for a regressor?

2015-02-16 Thread Pagliari, Roberto
spelling On Feb 16, 2015, at 20:02, "Pagliari, Roberto" mailto:rpagli...@appcomsci.com>> wrote: Hi Vlad/All, Thanks for the pointers. The reason I return a copy of X is because I don't want to modify the dataset during grid search with cross validation (I'm not sure if

[Scikit-learn-general] custom regressor keeps failing

2015-02-16 Thread Pagliari, Roberto
I keep failing with custom transformer implementation. I posted a question on stackoverflow, and deleted it as I think it should be more appropriate here. I followed the suggestions by other people. The code right now is this: from sklearn.base import TransformerMixin, BaseEstimator clas

Re: [Scikit-learn-general] which methods do I need to implement for a regressor?

2015-02-16 Thread Pagliari, Roberto
s! Yours, Vlad [1] http://scikit-learn.org/stable/developers/index.html#rolling-your-own-estimator [2] http://scikit-learn.org/stable/developers/index.html#estimated-attributes [3] http://scikit-learn.org/stable/developers/index.html#parameters-and-init > On 16 Feb 2015, at 12:52, Pagliari, Roberto w

Re: [Scikit-learn-general] which methods do I need to implement for a regressor?

2015-02-16 Thread Pagliari, Roberto
I looked into some examples I found online but I’m a bit confused. Supposed I want to implement my own transformer, something similar to the standard scaler. Would this be sufficient to be used in a pipeline, or should it be done differently? class ModelTransformer(TransformerMixin): def

Re: [Scikit-learn-general] which methods do I need to implement for a regressor?

2015-02-16 Thread Pagliari, Roberto
set_params. But set_params you can get by inheriting sklearn.base.BaseEstimator G On Mon, Feb 16, 2015 at 05:50:24AM +, Pagliari, Roberto wrote: > I'd like to implement my own regressor/classificator and possibly use > it in a pipeline. > do I need to implement all methods below or ca

[Scikit-learn-general] which methods do I need to implement for a regressor?

2015-02-15 Thread Pagliari, Roberto
I'd like to implement my own regressor/classificator and possibly use it in a pipeline. do I need to implement all methods below or can some of them be missing? decision_function

Re: [Scikit-learn-general] regression with one independent variable

2015-02-11 Thread Pagliari, Roberto
dex 1). On Wed, Feb 11, 2015 at 9:26 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: I’m trying to make a linear regression with one independent variable and one or more dependent variables. That does not seem to work. Is that a limitation of regression function? regr.fit(x

[Scikit-learn-general] regression with one independent variable

2015-02-11 Thread Pagliari, Roberto
I'm trying to make a linear regression with one independent variable and one or more dependent variables. That does not seem to work. Is that a limitation of regression function? regr.fit(x_train[:, 0], x_train[:, 1]) File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/base.py",

[Scikit-learn-general] meaning of copy_X in linear regression

2015-02-06 Thread Pagliari, Roberto
What is the meaning ov copy_X=True in LinearRegression? -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all thi

Re: [Scikit-learn-general] select number of features to keep

2015-01-29 Thread Pagliari, Roberto
bject: Re: [Scikit-learn-general] select number of features to keep For LinearSVC, see the docs: http://scikit-learn.org/dev/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC.transform I don't understand the second part of your question. On 01/29/2015 11:55 AM, Pagliari, Rob

Re: [Scikit-learn-general] random forest fit

2015-01-29 Thread Pagliari, Roberto
identical. The motivation is that you can write clf = svm.SVC().fit(X, y) On 01/28/2015 03:00 PM, Pagliari, Roberto wrote: Perhaps, this is a dumb question, but I saw both the alternatives below for using a classifier. I guess that regardless of whether you have clf = clf.fit or just clf.fit, the result

[Scikit-learn-general] select number of features to keep

2015-01-29 Thread Pagliari, Roberto
When using a feature selection algorithm in a pipeline, for example clf = Pipeline([ ('feature_selection', LinearSVC(penalty="l1")), ('classification', RandomForestClassifier()) ]) clf.fit(X, y) or even a random forest, for that matter, how does sklearn know how many features to keep? Thank

[Scikit-learn-general] random forest fit

2015-01-28 Thread Pagliari, Roberto
Perhaps, this is a dumb question, but I saw both the alternatives below for using a classifier. I guess that regardless of whether you have clf = clf.fit or just clf.fit, the result does not change and you can invoke clf._attribute_ with any of the alternatives below. Thanks, >>> from sklearn

[Scikit-learn-general] hadoop support wth sklearn

2015-01-05 Thread Pagliari, Roberto
Are there projects about supporting hdfs/hadooop based systems? I've seen similar things with R, and I was wondering if the same might happen with this project as well. Thanks -- Dive into the World of Parallel Programmi

[Scikit-learn-general] onehotencoder and data load

2014-12-15 Thread Pagliari, Roberto
When using OneHotEncoder, is it possible to have one integer per feature as the output, as opposed to binary representation? Also, when using OneHotEncoder, what would be the method to load data (.csv) with mixed type (number and categorical)? Thanks, --

Re: [Scikit-learn-general] swap error with SVM rbf kernel

2014-12-02 Thread Pagliari, Roberto
l instabilities. Cheers, Andy On 12/02/2014 10:45 AM, Pagliari, Roberto wrote: I'm using SVM with a dataset from kaggle competition(titanic). When running SVM I sometime get this error 731.52user 18.36system 2:03.66elapsed 606%CPU (0avgtext+0avgdata 67152maxresident)k 0inputs+16outputs (0major+3

[Scikit-learn-general] gradientboosting

2014-12-02 Thread Pagliari, Roberto
When using gradient boosting, my understanding is that samples that were misclassified are more emphasized. Which particular algorithm is used for classification? Thank you, -- Download BIRT iHub F-Type - The Free Enterp

[Scikit-learn-general] swap error with SVM rbf kernel

2014-12-02 Thread Pagliari, Roberto
I'm using SVM with a dataset from kaggle competition(titanic). When running SVM I sometime get this error 731.52user 18.36system 2:03.66elapsed 606%CPU (0avgtext+0avgdata 67152maxresident)k 0inputs+16outputs (0major+38276minor)pagefaults 0swaps Is there any way to debug this? Thank you

Re: [Scikit-learn-general] issues/nice to have in sklearn

2014-11-12 Thread Pagliari, Roberto
or https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+Feature%22 what you're looking for? On 12 November 2014 15:07, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: Is there a list of "nice to have" or current

[Scikit-learn-general] issues/nice to have in sklearn

2014-11-11 Thread Pagliari, Roberto
Is there a list of "nice to have" or current issues I can look at? -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications.

Re: [Scikit-learn-general] k-means with unbalanced clusters

2014-11-05 Thread Pagliari, Roberto
al] k-means with unbalanced clusters On 11/05/2014 03:15 PM, Pagliari, Roberto wrote: > I agree with you. However, for clarification purposes, do you know why in > this extreme case, false positive rate (where class 0 is much bigger than > class 1) might be pretty high if not 1? I

Re: [Scikit-learn-general] k-means with unbalanced clusters

2014-11-05 Thread Pagliari, Roberto
, November 05, 2014 1:58 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] k-means with unbalanced clusters On 11/05/2014 01:10 AM, Sturla Molden wrote: > "Pagliari, Roberto" > wrote: > >> If that's the case, why is that the underlying imp

Re: [Scikit-learn-general] k-means with unbalanced clusters

2014-11-05 Thread Pagliari, Roberto
lass sizes unbalanced). Thank you, -Original Message- From: Sturla Molden [mailto:sturla.mol...@gmail.com] Sent: Wednesday, November 05, 2014 1:21 AM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] k-means with unbalanced clusters "Pagliari,

Re: [Scikit-learn-general] k-means with unbalanced clusters

2014-11-04 Thread Pagliari, Roberto
hould always be small regardless. If that's the case, why is that the underlying implementation of k-means does not take this into account? Thanks, ____ From: Pagliari, Roberto [rpagli...@appcomsci.com] Sent: Wednesday, November 05, 2014 12:04 AM To: scikit-lea

[Scikit-learn-general] k-means with unbalanced clusters

2014-11-04 Thread Pagliari, Roberto
Suppose you have a two-class problem and, for instance, class 0 is much bigger than class 1. Is it possible that the centroid initially chosen for class 0 overlaps the one chosen for class 1 so that in the end the false negative rate is very high? I found situations when this phenomenon occurs,

[Scikit-learn-general] confidence interval

2014-10-30 Thread Pagliari, Roberto
Is there a utility to compute the confidence interval? -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/li

Re: [Scikit-learn-general] Regarding TPR and FPR

2014-10-24 Thread Pagliari, Roberto
Cross-validated FPR/TPR are not provided by default. You need to provide your own function to gridSearchCV if you want to know their values . From: shalu jhanwar [mailto:shalu.jhanwa...@gmail.com] Sent: Friday, October 24, 2014 11:36 AM To: Pagliari, Roberto Subject: Fwd: [Scikit-learn-general

Re: [Scikit-learn-general] Regarding TPR and FPR

2014-10-24 Thread Pagliari, Roberto
It depends on what you are doing. If all you need to do is to compute TPR and FPR, and you don’t need ROC, you can use the confusion_matrix module. Assuming you have two classes, from sklearn.metrics import confusion_matrix # some code cm = confusion_matrix(y_test, y_pred) TPR = cm[1][1] / (cm[0

Re: [Scikit-learn-general] SVM with rbf kernel

2014-10-22 Thread Pagliari, Roberto
-general] SVM with rbf kernel * not necessarily memory - also calculation complexity is O(n_samples x n_samples) On Tue, Oct 21, 2014 at 5:15 PM, Michael Eickenberg mailto:michael.eickenb...@gmail.com>> wrote: Dear Roberto, On Tue, Oct 21, 2014 at 4:27 PM, Pagliari, Roberto mailto:

Re: [Scikit-learn-general] SVM with rbf kernel

2014-10-21 Thread Pagliari, Roberto
that many samples (for example, numerical issues)? Thank you, From: Pagliari, Roberto [mailto:rpagli...@appcomsci.com] Sent: Tuesday, October 21, 2014 9:39 AM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] SVM with rbf kernel Hi, I was asking if having lot of

Re: [Scikit-learn-general] SVM with rbf kernel

2014-10-21 Thread Pagliari, Roberto
...@gmail.com] Sent: Tuesday, October 21, 2014 9:32 AM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] SVM with rbf kernel Dear Roberto, On Tue, Oct 21, 2014 at 2:58 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: I sometimes get weird results wi

[Scikit-learn-general] SVM with rbf kernel

2014-10-21 Thread Pagliari, Roberto
I sometimes get weird results with SVM and rbf kernel in terms of false positive/negative rates. I suspect there may be numerical issues going on, because I'm not seeing the same issues with linearSVC. Does anyone know if rbf is constrained in terms of number of dimensions? Unfortunately I can

[Scikit-learn-general] feature selection

2014-10-20 Thread Pagliari, Roberto
I'm not sure if I correctly understood the feature selection algorithms. Basically, accuracy, or any other scoring function is used to determine whether to keep a specific feature or not? If so, how is the optimal subset of features determined? Bruteforce would be exponential in complexity. Th

Re: [Scikit-learn-general] feature union

2014-10-07 Thread Pagliari, Roberto
04:48, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: I read this page on the documentation http://scikit-learn.org/stable/auto_examples/feature_stacker.html why is svm.fit needed before gridsearch? Thanks,

[Scikit-learn-general] feature union

2014-10-07 Thread Pagliari, Roberto
I read this page on the documentation http://scikit-learn.org/stable/auto_examples/feature_stacker.html why is svm.fit needed before gridsearch? Thanks, -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer

Re: [Scikit-learn-general] error when using linear SVM with AdaBoost

2014-10-06 Thread Pagliari, Roberto
Hi Matthieu, Which dataset are you referring to? Thanks From: Mathieu Blondel [mailto:math...@mblondel.org] Sent: Saturday, October 04, 2014 10:13 AM To: scikit-learn-general Subject: Re: [Scikit-learn-general] error when using linear SVM with AdaBoost On Sat, Oct 4, 2014 at 1:09 AM, Andy ma

[Scikit-learn-general] gridSearchCV parallel processing

2014-10-06 Thread Pagliari, Roberto
I'd like to use multiprocessing module to run different tasks at the same time (each of which may run grid search). Are there any known issues when using this module with gridSearchCV (and njobs >1 ), or anything I should consider when doing this? Thank you,

Re: [Scikit-learn-general] mutual information

2014-09-30 Thread Pagliari, Roberto
, statistically speaking, you compute the MI score to see to which extent is your observed frequency of cooccurrence different from what you would expect, so labels_true and labels_predict. On Tue, Sep 30, 2014 at 7:13 PM, Pagliari, Roberto mailto:rpagli...@appcomsci.com>> wrote: I’m a

[Scikit-learn-general] mutual information

2014-09-30 Thread Pagliari, Roberto
I'm a little confused by the description of mutual information score. What is the meaning of clustering, and why are the inputs called labels_true and labels_predict. Shouldn't mutual info be computed between two generic vectors X and Y? Thanks,

[Scikit-learn-general] error when using linear SVM with AdaBoost

2014-09-26 Thread Pagliari, Roberto
I'm trying to run AdaBoost with linear SVM and got this error: TypeError: fit() got an unexpected keyword argument 'sample_weight' The code looks like this: clf = AdaBoostClassifier(svm.LinearSVC(), n_estimators=args.ada_estimators, algorithm='SAMME')

Re: [Scikit-learn-general] adaboost weak lerners

2014-09-26 Thread Pagliari, Roberto
Never mind my question :) From: Pagliari, Roberto [mailto:rpagli...@appcomsci.com] Sent: Friday, September 26, 2014 4:14 PM To: scikit-learn-general@lists.sourceforge.net Subject: [Scikit-learn-general] adaboost weak lerners In sklearn version of Adaboost, which learners are used

[Scikit-learn-general] adaboost weak lerners

2014-09-26 Thread Pagliari, Roberto
In sklearn version of Adaboost, which learners are used? -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PC

Re: [Scikit-learn-general] sklearn on CentOS

2014-09-25 Thread Pagliari, Roberto
@lists.sourceforge.net Subject: Re: [Scikit-learn-general] sklearn on CentOS 2014-09-25 18:01 GMT+02:00 Pagliari, Roberto mailto:rpagli...@appcomsci.com>>: Here is it is numpy-1.4.1-9.el6.x86_64 package scipy is not installed strangely it is saying scipy is not installed, but I did install it and

Re: [Scikit-learn-general] sklearn on CentOS

2014-09-25 Thread Pagliari, Roberto
@lists.sourceforge.net Subject: Re: [Scikit-learn-general] sklearn on CentOS 2014-09-25 17:17 GMT+02:00 Pagliari, Roberto mailto:rpagli...@appcomsci.com>>: Sorry, I've got 0.14 and in fact the following warning UserWarning: Numpy 1.5.1 or above is recommended for this version of scipy

Re: [Scikit-learn-general] feature_importances_ from gridsearchCV

2014-09-25 Thread Pagliari, Roberto
bject: Re: [Scikit-learn-general] feature_importances_ from gridsearchCV On 09/25/2014 05:30 PM, Pagliari, Roberto wrote: > I just printed both best_estimator and best_parameters, but I'm not getting > the feature importance.. Can you elaborate? As Gael said, you are looking for

Re: [Scikit-learn-general] sklearn on CentOS

2014-09-25 Thread Pagliari, Roberto
against API version 9 but this version of numpy is 4 -Original Message- From: Andy [mailto:t3k...@gmail.com] Sent: Thursday, September 25, 2014 11:36 AM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] sklearn on CentOS On 09/25/2014 05:17 PM, Pagliari

Re: [Scikit-learn-general] feature_importances_ from gridsearchCV

2014-09-25 Thread Pagliari, Roberto
I just printed both best_estimator and best_parameters, but I'm not getting the feature importance.. -Original Message- From: Gael Varoquaux [mailto:gael.varoqu...@normalesup.org] Sent: Thursday, September 25, 2014 11:25 AM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Sci

Re: [Scikit-learn-general] feature_importances_ from gridsearchCV

2014-09-25 Thread Pagliari, Roberto
-general] feature_importances_ from gridsearchCV On Thu, Sep 25, 2014 at 03:06:15PM +, Pagliari, Roberto wrote: > the object clf will not have feature_importances_. Is that embedded in > best_estimator? Yes: best_estimator_.feature_importanc

Re: [Scikit-learn-general] sklearn on CentOS

2014-09-25 Thread Pagliari, Roberto
neral@lists.sourceforge.net Subject: Re: [Scikit-learn-general] sklearn on CentOS On 09/25/2014 05:02 PM, Pagliari, Roberto wrote: > Hi, > Via yum I got 1.4. for both libraries. Scipy is at 0.14.0 currently. > > > Thanks, > > > -Original Message- > From: Andy

  1   2   >