[Scikit-learn-general] Using GraphLASSO with robust correlation estimates

2016-03-11 Thread Daniel Homola
Hi all, I'm using GraphLASSO to estimate the graphical model and precision matrix of my variables. It is well known that GraphLASSO and related methods are very sensitive to contaminated data and their estimates have low break-down points: http://arxiv.org/abs/1501.01219 As suggested by the au

Re: [Scikit-learn-general] Mutual Info bases on nearest neighbors

2016-02-11 Thread Daniel Homola
andey mailto:shishir...@gmail.com>> wrote: Thanks. -- sp On Thu, Feb 11, 2016 at 6:41 AM, Daniel Homola mailto:daniel.homol...@imperial.ac.uk>> wrote: Hi, Mr Mayorov has done a great job and coded this up already:

Re: [Scikit-learn-general] Mutual Info bases on nearest neighbors

2016-02-10 Thread Daniel Homola
Hi, Mr Mayorov has done a great job and coded this up already: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/mutual_info_.py If you want to do feature selection based on MI, check out the JMI method: https://github.com/danielhomola/mifs Cheers, d On 02/11/2

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2016-01-31 Thread Daniel Homola
off some plots in the PR, that is always very welcome. On 05/08/2015 03:15 PM, Daniel Homola wrote: Hi Andy, Thanks! Will definitely do a github pull request once Miron confirmed he benchmarked my implementation by running it on the datasets the method was published with. I wrote a blog post

[Scikit-learn-general] Conditional Inference Trees?

2015-07-31 Thread Daniel Homola
Hi all, I was checking the archive of the mailing list to see if there were any attempts in the past to incorporate Conditional Inferences Trees into the Ensemble module. I've found a mail from Theo Strinopoulos (07-07-2013) asking if this would be welcomed as a contribution of his. Gilles Lo

Re: [Scikit-learn-general] how to know which feature is informative or redundant in make_classification()?

2015-05-28 Thread Daniel Homola
e first n_informative columns as the primary informative features, etc. HTH On 28 May 2015 at 19:18, Daniel Homola mailto:daniel.homol...@imperial.ac.uk>> wrote: Hi everyone, I'm benchmarking various feature selection methods, and for

Re: [Scikit-learn-general] how to know which feature is informative or redundant in make_classification()?

2015-05-28 Thread Daniel Homola
d features, and arbitrary noise for and remaining features. If you set shuffle=False, then you can extract the first n_informative columns as the primary informative features, etc. HTH On 28 May 2015 at 19:18, Daniel Homola mailto:daniel.homol...@imperial.ac.uk>> wrote:

[Scikit-learn-general] how to know which feature is informative or redundant in make_classification()?

2015-05-28 Thread Daniel Homola
Hi everyone, I'm benchmarking various feature selection methods, and for that I use the make_classification helper function which really great. However, is there a way to retrieve a list of the informative and redundant features after generating the fake data? It would really interesting to see

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-05-08 Thread Daniel Homola
dreas Mueller wrote: Btw, an example that compares this against existing feature selection methods that explains differences and advantages would help users and convince us to merge ;) On 05/08/2015 02:34 PM, Daniel Homola wrote: Hi all, I wrote a couple of weeks ago about implementing the Boruta

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-05-08 Thread Daniel Homola
omola/boruta_py Let me know what you think. If anyone thinks this might be worthy of adding it to the feature selection module, the original author Miron is happy to give his blessing, and I'm happy work on it further. Cheers, Daniel On 15/04/15 11:03, Daniel Homola wrote: Hi all,

Re: [Scikit-learn-general] Random forest with correlated features?

2015-04-27 Thread Daniel Homola
o for the latter one). I went thought some problems with the R package that you are suggesting so I would not use that. I hope this can help. Best, Luca On Mon, Apr 27, 2015 at 4:48 PM, Daniel Homola <mailto:daniel.homol...@imperial.ac.uk>> wrote: Dear all, I've found se

[Scikit-learn-general] Random forest with correlated features?

2015-04-27 Thread Daniel Homola
Dear all, I've found several articles expressing concerns about using Random Forest with highly correlated features (e.g. http://www.biomedcentral.com/1471-2105/9/307). I was wondering if this drawback of the RF algorithm could be somehow remedied using scikit-learn methods? The above linked p

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Daniel Homola
control. The question is whether the feature importance that is used is different from ours. Gilles? If not, this could be hard to add. If it is the same, I think a meta-estimator would be a nice addition to the feature selection module. Cheers, Andy On 04/15/2015 11:32 AM, Daniel Homola

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Daniel Homola
gged as spam as your link is broken and links to some imperial college internal page. Cheers, Andy On 04/15/2015 05:03 AM, Daniel Homola wrote: Hi all, I needed a multivariate feature selection method for my work. As I'm working with biological/medical data, where n < p or even n <&l

[Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Daniel Homola
uld consider incorporating into the feature selection module of scikit-learn? If yes, do you have a tutorial or some sort of guidance about how should I prepare the code, what conventions should I follow, etc? Cheers, Daniel Homola STRA