Hi Trevor,
This is an interesting question, and I don't have a clear cut opinion.
What you are talking about is, in essence a trademark issue: the brand
"scikit-learn", carries implications about quality and API. We enforce
this on the scikit-learn package and would indeed love if the users
assoc
Hi Trevor,
I am only speaking for myself, not on behalf of the scikit-learn project,
but I would be +1 for your project and use of the -learn suffix. The pros
you cite are in my opinion more important than the cons.
Cheers,
Gilles
On 28 April 2015 at 05:33, Trevor Stephens wrote:
> Hi All,
>
>
Hi All,
I've been working for the past month or so on a third-party add-on/plug-in
package `gplearn` that uses the scikit-learn API to implement genetic
programming for symbolic regression tasks in Python and maintains
compatibility with the sklearn pipeline and gridsearch modules, etc. The
reason
I suspect this method is underreported by any particular name, as it's a
straightforward greedy search. It is also very close to what I think many
researchers do in system development or report in system analysis, albeit
with more automation.
In the case of KNN, I would think metric learning could
Maybe we would want mrmr first?
http://penglab.janelia.org/proj/mRMR/
On 04/27/2015 06:46 PM, Sebastian Raschka wrote:
>> I guess that could be done, but has a much higher complexity than RFE.
> Oh yes, I agree, the sequential feature algorithms are definitely
> computationally more costly.
>
>
> I guess that could be done, but has a much higher complexity than RFE.
Oh yes, I agree, the sequential feature algorithms are definitely
computationally more costly.
> It seems interesting. Is that really used in practice and is there any
> literature evaluating it?
I am not sure how often
I think you can find here something of more rigorous.
http://orbi.ulg.ac.be/handle/2268/170309
On Mon, Apr 27, 2015 at 11:20 PM, Daniel Homola <
daniel.homol...@imperial.ac.uk> wrote:
> Hi Luca,
>
> The reason I asked is because I'm interested in the second problem. Thanks
> a lot for the pap
Hi Luca,
The reason I asked is because I'm interested in the second problem.
Thanks a lot for the paper and the suggested params, I'll read it and
try them!
Has anyone tested these assumptions/parameters rigorously on simulated
data, or is this more of a feeling?
Thanks again for the quick
That is like a one-step look-ahead feature selection?
I guess that could be done, but has a much higher complexity than RFE.
RFE works for anything that returns "importances", not just linear models.
It doesn't really work for KNN, as you say. [I wouldn't say
non-parametric models. Trees are prett
I assume you have checked that combine_train_test_dataset produces data of
the correct dimensions in both X and y.
I would be very surprised if the problem were not in PAA, so check it
again: make sure that you test that PAA().fit(X1).transform(X2) gives the
transformation of X2. The error seems t
You changed the labels only once, and have a test-set size of 4? I would
imagine that is where that comes from.
If you repeat over different assignments, you will get 50/50.
On 04/27/2015 11:33 AM, Fabrizio Fasano wrote:
> Dear Andy,
>
> Yes, the classes have the same size, 8 and 8
>
> this is on
Hey,
I spent quiet some time with this problem.
1) if you are interested only in prediction this is not a big problem. You
can preproces the data with PCA
2) if you want to understand which variables are important
I suggest you to read the paper "Understanding variable importances in
forests of r
Hi Andreas,
Thanks for your response.
No, PAA does not change the number of samples. It just reduces the number
of features.
For example if the input matrix is X and X.shape = (100, 100) and the
n_components = 10 in PAA, then the resultant X.shape = (100, 10).
Yes, I did try using PAA in the ip
Does PAA by any chance change the number of samples?
The error is:
ValueError: Found array with dim 37. Expected 19
Interestingly that happens only in the scoring.
Does it work without the grid-search?
On 04/27/2015 07:14 AM, Jitesh Khandelwal wrote:
Hi all,
I am trying to use grid search to
Hi, I was wondering if sequential feature selection algorithms are currently
implemented in scikit-learn. The closest that I could find was recursive
feature elimination (RFE);
http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html.
However, unless the application r
Dear all,
I've found several articles expressing concerns about using Random
Forest with highly correlated features (e.g.
http://www.biomedcentral.com/1471-2105/9/307).
I was wondering if this drawback of the RF algorithm could be somehow
remedied using scikit-learn methods? The above linked p
Dear Andy,
Yes, the classes have the same size, 8 and 8
this is one example of code I used to cross validate classification (I used
here StratifiedShuffleSplit, but I also used other methods as leave one out or
simple 4-fold cross validation, and the result didn't change so much)
from sklearn.
On Mon, Apr 27, 2015 at 4:44 PM, Jitesh Khandelwal
wrote:
> Hi all,
>
> I am trying to use grid search to evaluate some decomposition techniques
> of my own. I have implemented some custom transformers such as PAA, DFT,
> DWT as shown in the code below.
>
> I am getting a strange "ValueError" whe
Hi all,
I am trying to use grid search to evaluate some decomposition techniques of
my own. I have implemented some custom transformers such as PAA, DFT, DWT
as shown in the code below.
I am getting a strange "ValueError" when run the below code and I am unable
to figure out the origin of the pro
19 matches
Mail list logo