On Sat, Mar 26, 2016 at 05:31:36PM -0400, Sebastian Raschka wrote:
> I wouldn’t fundamentally change the random forest algorithm in scikit-learn
> using ideas from xgboost, since it wouldn’t be a random forest anymore, then.
> Please don’t get me wrong, I’d also like to see a more efficient (pred
I don't think we can deny this is strange, certainly for real-world, IID
data!
On 13 April 2016 at 10:31, Juan Nunez-Iglesias wrote:
> Yes but would you expect sampling 280K / 3M to be qualitatively different
> from sampling 70K / 3M?
>
> At any rate I'll attempt a more rigorous test later this
Yes but would you expect sampling 280K / 3M to be qualitatively different
from sampling 70K / 3M?
At any rate I'll attempt a more rigorous test later this week and report
back. Thanks!
Juan.
On Wed, Apr 13, 2016 at 10:21 AM, Joel Nothman
wrote:
> It's hard to believe this is a software problem
It's hard to believe this is a software problem rather than a data problem.
If your data was accidentally a duplicate of the dataset, you could
certainly get 100%.
On 13 April 2016 at 10:10, Juan Nunez-Iglesias wrote:
> Hallelujah! I'd given up on this thread. Thanks for resurrecting it, Andy!
>
Hallelujah! I'd given up on this thread. Thanks for resurrecting it, Andy!
=)
However, I don't think data distribution can explain the result, since
GridSearchCV gives the expected result (~0.8 accuracy) with 3K and 70K
random samples but changes to perfect classification for 280K samples. I
don't
Have you tried to "score" the grid-search on the non-training set?
The cross-validation is using stratified k-fold while your confirmation
used the beginning of the dataset vs the rest.
Your data is probably not IID.
On 03/10/2016 01:08 AM, Juan Nunez-Iglesias wrote:
Hi all,
TL;DR: when I ru
Another possibility is to threshold the predict_proba differently, such
that the decision maximizes whatever metric you have defined.
On 03/15/2016 07:44 AM, Mamun Rashid wrote:
Hi All,
I have asked this question couple of weeks ago on the list. I have a
two class problem where my positive cl
I would definitely join the sprint, anything after June 17 works for
me. I was thinking to come hang around during ICML, even if I might
not be able to afford the conference.
Cheers,
Vlad
On Tue, Apr 12, 2016 at 11:39 AM, Andreas Mueller wrote:
> So should we pick another or possibly an addition
So should we pick another or possibly an additional date?
Will anyone be in NYC for ICML / UAI / COLT?
On 04/12/2016 03:56 AM, Alexandre Gramfort wrote:
>> Sorry, ICML is at the same dates as the big brain imaging conference, so
>> I will not be able to attend (neither the conference, nor a sprint
Hi Manjush,
Yes, this issue has been reported.
You can use the data from the following link. It's train and test data sets
do not have spaces between commas, so I was able to load this using
svmlight.
Link :
http://research.microsoft.com/en-us/um/people/manik/downloads/XC/XMLRepository.html
On
It depends on your problem statement and data set you are using to train
your model. Can you be more specific
Regards,
Manjush
On Wed, Feb 17, 2016 at 8:26 AM Shishir Pandey wrote:
> Hi
>
> What properties of data should I look at to justify that mutual
> information is a good feature selection
Is this issue reported already ? I am getting same error while trying to
load kaggle train.csv (same file) with load_svmlight_file
Regards,
Manjush
On Sat, Feb 13, 2016 at 9:56 AM Gunjan Dewan
wrote:
> Ill do that.
>
> Thanks a lot.
>
> Gunjan
>
> On Sat, Feb 13, 2016 at 6:04 AM, Mathieu Blonde
> Sorry, ICML is at the same dates as the big brain imaging conference, so
> I will not be able to attend (neither the conference, nor a sprint).
same for me. Surprisingly...
Alex
--
Find and fix application performance
13 matches
Mail list logo