Re: [Scikit-learn-general] Comparisons of classifiers

2016-03-26 Thread Raphael C
etter) then that is already very interesting. Raphael > > On Mar 22, 2016, at 7:52 AM, Raphael C > > wrote: > > > >> > >> - In tree-based Not handling categorical variables as such hurts us a > lot > >> There's a PR to fix that, it still needs a b

Re: [Scikit-learn-general] Comparisons of classifiers

2016-03-22 Thread Raphael C
> > - In tree-based Not handling categorical variables as such hurts us a lot > There's a PR to fix that, it still needs a bit of love: > https://github.com/scikit-learn/scikit-learn/pull/4899 > This is a conversation moved from https://github.com/scikit-learn/scikit-learn/pull/4899 . In the

[Scikit-learn-general] A new paper about xgboost

2016-03-18 Thread Raphael C
This paper about xgboost came out recently which I thought might be of interest. http://arxiv.org/pdf/1603.02754v1.pdf Raphael -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel

Re: [Scikit-learn-general] Restrictions on feature names when drawing decision tree

2016-03-12 Thread Raphael C
, filled=True, rounded=True, special_characters=True) graph = pydot.graph_from_dot_data(dot_data.getvalue()) Image(graph.create_png() Raphael On 12 March 2016 at 13:56, Raphael C wrote: > I am attempting to draw a decision tree using: >

[Scikit-learn-general] Restrictions on feature names when drawing decision tree

2016-03-12 Thread Raphael C
I am attempting to draw a decision tree using: reg = DecisionTreeRegressor(max_depth=None,min_samples_split=1) reg.fit(X,Y) dot_data = StringIO() tree.export_graphviz(reg, out_file=dot_data, feature_names=feature_names, filled=True, rounded=True,

Re: [Scikit-learn-general] Comparisons of classifiers

2015-11-08 Thread Raphael C
On 8 November 2015 at 20:42, Sebastian Raschka wrote: > Hm, I have to think about this more. But another case where I think that the > handling of categorical features could be useful is in non-binary trees; not > necessarily while learning but in making predictions more efficiently. E.g., > as

Re: [Scikit-learn-general] Comparisons of classifiers

2015-11-08 Thread Raphael C
On 8 November 2015 at 17:50, Sebastian Raschka wrote: > >> On Nov 8, 2015, at 11:32 AM, Raphael C wrote: >> >> In terms of computational efficiency, one-hot encoding combined with >> the support for sparse feature vectors seems to work well, at least >> for me.

Re: [Scikit-learn-general] Comparisons of classifiers

2015-11-08 Thread Raphael C
On 5 November 2015 at 13:38, Gael Varoquaux wrote: > On Thu, Nov 05, 2015 at 07:05:11AM +0000, Raphael C wrote: >> https://github.com/szilard/benchm-ml > >> The upshot is that in some cases it seems that the scikit-learn >> versions have room for improvement. > > T

[Scikit-learn-general] Comparisons of classifiers

2015-11-04 Thread Raphael C
I don't know if this has been widely seen, but there is an interesting comparison of classifiers from different machine learning libraries at: https://github.com/szilard/benchm-ml The upshot is that in some cases it seems that the scikit-learn versions have room for improvement. I don't know how

Re: [Scikit-learn-general] Ranking algorithms

2015-10-25 Thread Raphael C
On 25 October 2015 at 19:44, olologin wrote: > On 10/25/2015 08:12 PM, Raphael C wrote: >> >> From my quick reading of the thread it seems that people aren't >> convinced LambdaMART is very good in practice. Is that right/wrong? >> >> Raphael >> >

Re: [Scikit-learn-general] Ranking algorithms

2015-10-25 Thread Raphael C
https://github.com/scikit-learn/scikit-learn/pull/2580 is the PR but it seems to have reached an unfortunate impasse. >From my quick reading of the thread it seems that people aren't convinced LambdaMART is very good in practice. Is that right/wrong? Raphael On 25 Oct 2015 16:46, "olologin" wro

[Scikit-learn-general] How to optimize a random forest for out of sample prediction

2015-10-07 Thread Raphael C
I have a training set, a validation set and a test set. I build a random forest using RandomForestClassifier on the training set. However, I would like to tune it by scoring on the validation set. I find that the cross-validation score on the training set is a lot better than the score on the