Re: [Scikit-learn-general] boosting: false-positives versus false-negatives

2015-08-04 Thread Simon Burton
Hi Andy, thankyou for your comments. I am still a bit confused about this so let me try again to explain what I am thinking. Here is a little table showing how we would normally score classifier performance on a single sample: A B c(A) +1 -1 c(B) -1 +1 Ie. if the true class is

Re: [Scikit-learn-general] contributing

2015-08-04 Thread Andy
Hi Jaret. It totally depends on your what your interested in and familiar with. The issue tracker has lots of issues to fix, look at the "easy" issues, the "need contributor" ones or the "bug" ones. And definitely look at the contributing section: http://scikit-learn.org/dev/developers/contribut

[Scikit-learn-general] contributing

2015-08-04 Thread Jaret Flores
Since I've come into a good amount of free time lately, I wanted to find a way to contribute. As it says in the documentation, I wanted to check here to see what would be a good use of my time. If anyone has suggestions as to where I should devote some time, please let me know. Thanks. -Jaret --

Re: [Scikit-learn-general] Multiple normal scenario for one-class SVM

2015-08-04 Thread Ady Wahyudi Paundu
Haven't try to tune the parameters of the combined normal data though (dont know why i forget that..) I'll do that. Thanks Andy Regards, Ady On Wednesday, August 5, 2015, Andreas Mueller wrote: > Do you have ground-truth? Then you should tune the parameters. > Have you done that? > > > On 08/04

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Andreas Mueller
On 08/04/2015 01:49 PM, Maria Gorinova wrote: On 4 August 2015 at 18:25, Ronnie Ghose > wrote: are you able to make a np.ones stand alone of that size? Yes, I can create a np.ones array of size 100 000 000 approximatelly. On 4 August 2015 at 18:26, Andreas

Re: [Scikit-learn-general] Multiple normal scenario for one-class SVM

2015-08-04 Thread Andreas Mueller
Do you have ground-truth? Then you should tune the parameters. Have you done that? On 08/04/2015 01:45 PM, Ady Wahyudi Paundu wrote: > Hi Andy, thank you for the swift reply. > > No, for both case I was using the same set of parameters (nu and gamma > = 0.01, kernel=rbf) > > Thank you for your su

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Maria Gorinova
On 4 August 2015 at 18:25, Ronnie Ghose wrote: > are you able to make a np.ones stand alone of that size? > Yes, I can create a np.ones array of size 100 000 000 approximatelly. On 4 August 2015 at 18:26, Andreas Mueller wrote: > That array would take about 700mb of ram. Do you have that much

Re: [Scikit-learn-general] Multiple normal scenario for one-class SVM

2015-08-04 Thread Ady Wahyudi Paundu
Hi Andy, thank you for the swift reply. No, for both case I was using the same set of parameters (nu and gamma = 0.01, kernel=rbf) Thank you for your suggestion, I will look into it. Regards, Ady On 8/5/15, Andreas Mueller wrote: > Hi Ady. > Are you selecting parameters separately for the two

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Andreas Mueller
That array would take about 700mb of ram. Do you have that much available? Btw, you could work around this issue probably by using HashingVectorizer instead of CountVectorizer. On 08/04/2015 01:20 PM, Maria Gorinova wrote: Hi Andy, Thanks, I updated to 0.16.1, but the problem persists. len(j_

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Ronnie Ghose
are you able to make a np.ones stand alone of that size? On Tue, Aug 4, 2015 at 10:20 AM, Maria Gorinova wrote: > Hi Andy, > > Thanks, I updated to 0.16.1, but the problem persists. > len(j_indices) is 68 356 000 when running for range(0,2000) and exactly > half of that when running for range(0,

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Maria Gorinova
Hi Andy, Thanks, I updated to 0.16.1, but the problem persists. len(j_indices) is 68 356 000 when running for range(0,2000) and exactly half of that when running for range(0,1000). Sebastian, thank you for the suggestion, but again, the issue doesn't seem to be that the process is using too much

Re: [Scikit-learn-general] Multiple normal scenario for one-class SVM

2015-08-04 Thread Andreas Mueller
Hi Ady. Are you selecting parameters separately for the two models in the separate case? Btw, if you are modelling a single normal, maybe EllipticEnvelope would work better. Best, Andy On 08/04/2015 01:07 PM, Ady Wahyudi Paundu wrote: > Hi all, > > How am I supposed to work with multiple set of

[Scikit-learn-general] Multiple normal scenario for one-class SVM

2015-08-04 Thread Ady Wahyudi Paundu
Hi all, How am I supposed to work with multiple set of normal data for one-class SVM? If I have two normal scenario data set, A and B for learning phase, should I create predictor model separately (M(A) + M(B)) or can I combine A and B to create just a single predictor model (M(A+B))? I have try

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Andreas Mueller
Thanks Maria. What I was asking was that you could use the debugger to see what len(j_indices) is when it crashes. I'm not sure if there were improvements to this code since 0.15.2 but I'd encourage you to upgrade to 0.16.1 anyhow. Cheers, Andy On 08/04/2015 11:56 AM, Maria Gorinova wrote:

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Sebastian Raschka
Hm, I have never used Python on Windows but I have heard from many people that it is way buggier than the Posix equivalent; maybe it's just a quirk of the garbage collector? Maybe you could try to add the following lines: gc.collect() len(gc.get_objects()) inside your for-loop and give it anot

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Maria Gorinova
Hi Andreas, Thank you for the reply. The error also happens if I load different files, yes, but here I am actually loading the SAME file "a.txt". Which I did, just to demonstrate how awkward the error is... I don't know what len(j_indices) is, that's in sklearn\feature_extraction\text.py as shown

Re: [Scikit-learn-general] AUC realy low

2015-08-04 Thread Andreas Mueller
You should select the other column from predict_proba for auc. On 08/04/2015 10:54 AM, Herbert Schulz wrote: Thanks for the answer! hmm its possible, I just make a little example: auc is [0.952710670069, 0.01890450385597026, 0.0059624156214325846, 0.05391726570661811] expected is [0.0,

Re: [Scikit-learn-general] boosting: false-positives versus false-negatives

2015-08-04 Thread Andreas Mueller
Hi Simon. In general in scikit-learn you could use class-weights to make one class more important then the other. Unfortunately that is not implemented for AdaBoost yet. You can however use the sample_weights parameter of the fit method, and create sample weights either by hand based on the class

Re: [Scikit-learn-general] Weird memory error

2015-08-04 Thread Andreas Mueller
Just to make sure, you are actually loading different files, not the same file over and over again, right? It seems an odd place for a memory error. Which version of scikit-learn are you using? What is ``len(j_indices)``? On 08/04/2015 10:18 AM, Maria Gorinova wrote: Hello, (I think I might

Re: [Scikit-learn-general] AUC realy low

2015-08-04 Thread Herbert Schulz
Thanks for the answer! hmm its possible, I just make a little example: auc is [0.952710670069, 0.01890450385597026, 0.0059624156214325846, 0.05391726570661811] expected is [0.0, 1.0, 1.0, 1.0] but this is already with changed values, in the test set i set every value 0->1 and 1 to 0. SO th

Re: [Scikit-learn-general] AUC realy low

2015-08-04 Thread Artem
Hi Herbert The worst value for AUC is 0.5 actually. Having values close to 0 means than you can get a value as close to 1 by just changing your predictions (predict class 1 when you think it's 0 and vice versa). Are you sure you didn't confuse classes somewhere along the lines? (You might have cho

[Scikit-learn-general] boosting: false-positives versus false-negatives

2015-08-04 Thread Simon Burton
Hi, I am attempting to build some classification models where false-positives are much worse than false-negatives. Normally these two outcomes are treated equally (equal loss) in the training procedure, but I would like to be able to customize this. I've been using the AdaBoost classifier, which

[Scikit-learn-general] Weird memory error

2015-08-04 Thread Maria Gorinova
Hello, (I think I might have sent this to the wrong address the first time, so I'm sending it again) I have been trying to find my way around a weird memory error for days now. If I'm doing something wrong and this question is completely dumb, I'm sorry for spamming the maillist. But I'm desperat

[Scikit-learn-general] AUC realy low

2015-08-04 Thread Herbert Schulz
Hey, I'm computing the AUC for some data... The classification target is 1 or 0. And i have a lot of 0's ( 5600) and just 700 1's as a target. My AUC is about 0.097... where y_test are a vector containing 1's and 0's and auc is containg the predict_proba values roc= metrics.roc_auc_score(y_