Re: [Scikit-learn-general] Dynamic Time Warping

2015-01-20 Thread Andy
Hi Diego. I think the conclusion of that discussion still holds. We don't do any time-series specific stuff in scikit-learn. I think it would be good to ask over at pandas what they think about it. Otherwise, just publish your code on github, I'm sure people will find it useful. We'd be happy t

[Scikit-learn-general] Dynamic Time Warping

2015-01-20 Thread Diego Ardila
There was a thread a while ago regarding this. http://sourceforge.net/p/scikit-learn/mailman/scikit-learn-general/thread/d04b0e2e49ab5e40b298765dbbbc8f5e10937...@mailsvr001.fleet.dns/ Which gave some suggestions on where it would fit in. The result of that discussion seemed to be that Gael Varoqu

Re: [Scikit-learn-general] Different results when repeating SVC

2015-01-20 Thread Andy
The code below uses prediction in the roc_auc_score, which is not correct. I don't see where you isolate the splits. Maybe go on the stackoverflow code review site with you full code. If you find the results not reproducible, it is very likely that your two code paths don't match. From the code y

Re: [Scikit-learn-general] Different results in R and sklearn

2015-01-20 Thread Andy
I think this library might apply zero mean, unit variance scaling before using libsvm. Try applying the ``StandardScaler`` to your data before using SVC. On 01/20/2015 11:29 AM, Timothy Vivian-Griffiths wrote: > Hi Andy, > > Firstly, the dimensions that I gave were wrong, you're right. The inputs

[Scikit-learn-general] Different results in R and sklearn

2015-01-20 Thread Timothy Vivian-Griffiths
Hi Andy, Firstly, the dimensions that I gave were wrong, you're right. The inputs are correct but the target vector shape is (7763,) so there are that many samples with 125 features (that is in the smaller dataset I am using, the other has over 30,000 features but I haven't tried that one in R

[Scikit-learn-general] Different results when repeating SVC

2015-01-20 Thread Timothy Vivian-Griffiths
Hi Andy, Yes, when I run the cross_val_score using the same seed then the outcomes are the same. So that is all working as expected. However, the problems arise when using the permutation in the code below. When mapping the function across the 500 seeds, I am getting out some very high scores f