[scikit-learn] VotingRegressor with pretrained estimators from CV as input

2022-11-10 Thread Fernando Marcos Wittmann
Hello, I'm dealing with a problem without much data. As a solution, I'm training 10 estimators using a 10-Fold CV-Schema. Now, I wanted to persist those models. In order to avoid having to save 10 estimators, I was thinking about saving a single VotingRegressor with those pre-trained models or

Re: [scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score

2021-08-14 Thread Fernando Marcos Wittmann
Hi Samir, the following visualization might be useful for gaining intuition on the meaning of a negative r2: https://gist.github.com/WittmannF/02060b45ce3ec9239898a5b91df2564e A negative r2 is reflects into a model predicting the opposite trend of the data. On Sat, Aug 14, 2021, 03:17 Samir K

Re: [scikit-learn] Opinion on reference mentioning that RF uses weak learners

2020-08-16 Thread Fernando Marcos Wittmann
In my opinion the reference is distorting a concept that has a consolidated definition in the community. I am also familiar with the definition of WL as "an estimator slightly better than guessing", mostly decision stumps ( https://en.m.wikipedia.org/wiki/Decision_stump), which is not an component

[scikit-learn] Opinion on reference mentioning that RF uses weak learners

2020-08-16 Thread Fernando Marcos Wittmann
Hello guys, The the following reference states that Random Forests uses weak learners: -

Re: [scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?

2020-05-10 Thread Fernando Marcos Wittmann
ment would be repeated in a given tree). My apologies. Everything makes sense again On Sun, May 10, 2020, 19:42 Fernando Marcos Wittmann < fernando.wittm...@gmail.com> wrote: > Okay, so it's sampling with replacement with same size of the original > dataset. That mean that som

Re: [scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?

2020-05-10 Thread Fernando Marcos Wittmann
My question is why the full dataset is being used as default when building each tree. That's not random forest. The main point of RF is to build each tree with a subsample of the full dataset On Sun, May 10, 2020, 09:50 Joel Nothman wrote: > A bootstrap is very commonly a random draw with

[scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?

2020-05-08 Thread Fernando Marcos Wittmann
When reading the documentation of Random Forest, I got the following: ``` max_samples : int or float, default=None If bootstrap is True, the number of samples to draw from X to train each base estimator. - *If None (default), then draw `X.shape[0]` samples.* - If int, then draw `max_samples`

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-06 Thread Fernando Marcos Wittmann
That's an excellent discussion! I've always wondered how other tools like R handled naturally categorical variables or not. LightGBM has a scikit-learn API which handles categorical features by inputting their columns names (or indexes): ``` import lightgbm lgb=lightgbm.LGBMClassifier()

Re: [scikit-learn] question

2019-10-20 Thread Fernando Marcos Wittmann
What about converting into two columns? One with the real projection and the other with the complex projection? On Sat, Oct 19, 2019, 3:44 PM ahmad qassemi wrote: > Dear Mr/Mrs, > > I'm a PhD student in DS. I'm trying to use your provided code on *Spectral > CoClustering *and *Spectral

Re: [scikit-learn] One-hot encoding

2018-08-03 Thread Fernando Marcos Wittmann
Hi Sarah, I have some reflection questions. You don't need to answer all of them :) how many categories (approximately) do you have in each of those 20M categorical variables? How many samples do you have? Maybe you should consider different encoding strategies such as binary encoding. Also, this

Re: [scikit-learn] Scikit Multi learn error.

2018-06-26 Thread Fernando Marcos Wittmann
>> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > ___ > scikit-learn mailing list > scikit-learn@python.org

Re: [scikit-learn] Error while using GridSearchCV.

2017-03-07 Thread Fernando Marcos Wittmann
_() got multiple values for keyword argument 'n_splits' > > > > > -- > *Thanks,* > *Shubham Singh Tomar* > *Autodidact24.github.io <http://Autodidact24.github.io>* > > ___ > scikit-learn