Re: [scikit-learn] purpose of test: check_classifiers_train

2017-10-13 Thread Andreas Mueller
Sorry for the misinformation. Yes, actually I'd argue you should raise an error on data that's not non-negative, if that's not valid input. Right now there is no way to specify to the testing suite that your model requires positive data, that's what the PR is about (among other things) that I r

Re: [scikit-learn] purpose of test: check_classifiers_train

2017-10-12 Thread Michael Capizzi
So it appears that the test check_classifiers_train() ( https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/utils/estimator_checks.py#L1079) does *not* use the iris dataset after all: X_m, y_m = make_blobs(n_samples=300, random_state=0) X_m, y_m = shuffle(X_m, y_m, random_state=7) X

Re: [scikit-learn] purpose of test: check_classifiers_train

2017-10-12 Thread Michael Capizzi
Thanks @andreas, for your comments, especially the info that it's the `iris` dataset. I have to dig a bit deeper to see what's going on with the performance there. But now that I know it's `iris`, I can try to recreate. -M On Thu, Oct 12, 2017 at 12:01 AM, Andreas Mueller wrote: > Yes, it's p

Re: [scikit-learn] purpose of test: check_classifiers_train

2017-10-12 Thread Andreas Mueller
Yes, it's pretty empirical, and with the estimator tags PR (https://github.com/scikit-learn/scikit-learn/pull/8022) we will be able to relax it if there's a good reason you're not passing. But the dataset is pretty trivial (iris), and you're getting chance performance (it's a balanced three clas

Re: [scikit-learn] purpose of test: check_classifiers_train

2017-10-11 Thread Guillaume Lemaître
Not sure 100% but this is an integration/sanity check since all classifiers are supposed to predict quite well and data used to train. This is true that 83% is empirical but it allows to spot any changes done in the algorithms even if the unit tests are passing for some reason. On 11 October 2017

[scikit-learn] purpose of test: check_classifiers_train

2017-10-11 Thread Michael Capizzi
I’m wondering if anyone can identify the purpose of this test: check_classifiers_train(), specifically this line: https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/utils/estimator_checks.py#L1106 My custom classifier (which I’m hoping to submit to scikit-learn-contrib) is failing