The class _ThresholdScorer in sklearn.metrics.scorer need to be patched
to accept multi-label input.
A pull request is welcomed !
Best regards,
Arnaud
On 14 Aug 2013, at 17:35, Josh Wasserstein wrote:
> Say I define the following scoring function:
>
> def multi_label_macro_auc(y_gt, y_pred):
By removing label from the training set, and then rerun the process (fit,
predict, etc.). The result looks reasonable.
Thank you very much.
- Original Message -
From: Andreas Mueller
To: Jason Williams ;
scikit-learn-general@lists.sourceforge.net
Cc:
Sent: Thursday, 15 August 2013,
On 08/15/2013 01:08 PM, Jason Williams wrote:
> I follow the sample at
> http://blog.yhathq.com/posts/random-forests-in-python.html where it randomly
> assigns true, false to the dataset
>
> np.random.uniform(0, 1, len(df)) <= .75
>
> then partition dataset into train set and test set. I use
Thanks a lot Olivier for suggesting Alex Blog.
My apologies!! I rephrase my problem.
I have two data set of Brain MR images, lets call it A and B. A is acquired
in one country
and B in another. The data-set A contains both patients having pathology
and healthy volunteers where as data-set B contain
I follow the sample at
http://blog.yhathq.com/posts/random-forests-in-python.html where it randomly
assigns true, false to the dataset
np.random.uniform(0, 1, len(df)) <= .75
then partition dataset into train set and test set. I use the same way for
creating model
rfc = RandomForestC
LinearSVC does not predict probabilities but the linear decision
function is made available as the decision_function method.
It should be possible to train a calibration model to turn those raw
decision values as probabilities using an IsotonicRegression model [1]
and cross-validation.
There is n
I don't really understand what are the samples, the labels and the
features in your case and how much unlabeled data do you have and what
do you mean by "I have completed the classification task on 1st
database.": if you have labeled datasets what does "completion of the
classification task" mean?.
Hello Folks !
I have two different brain MR image databases acquired
across two different countries. I need to perform patch based supervised
binary classification task (+ pathology and - Normal). The 1st database
contains both +pathology patients and -normal subjects whereas second
You can also try Nearest-Neighbors. They accept as well an output
matrix since 0.14.
On 15 August 2013 09:52, Gilles Louppe wrote:
>> If I understand you correctly, you're trying to do multilabel classification
>> by converting the problem to a multitask binary classification problem.
>> Unfortun
> If I understand you correctly, you're trying to do multilabel classification
> by converting the problem to a multitask binary classification problem.
> Unfortunately, no classifier in scikit-learn can accept an output matrix.
> You need to solve each task independently by fitting a classifier wi
Hi Jason,
It looks like you are evaluating your error on your training data,
aren't you? It will give you a (very) poor estimate of the
generalization error of your model. Instead, try your model on an
independent part of your dataset (in particular, one which has a not
been used to fit to your fo
Or perhaps since it's a bug in multiprocessing's queuing protocol, joblib
could handle it by writing oversize object to disk, assuming there's enough
free space in $TMPDIR.
On Thu, Aug 15, 2013 at 5:32 PM, Joel Nothman
wrote:
> I've been getting "SystemError: NULL result without error in
> PyObj
I've been getting "SystemError: NULL result without error in PyObject_Call"
when trying to perform a parallel grid search (with logistic regression,
n_jobs>=2) with a very large matrix.
So it comes down to http://bugs.python.org/issue17560. It would be good if
we could make the error a little less
2013/8/15 Josh Wasserstein :
> It looks like it doesn't, but I just wanted to make sure.
No. You can use LogisticRegression, which uses the same training
algorithm (Liblinear) but a different objective function (log-loss).
--
The first thing I'd do is publish the result (just kidding!).
Try it with another data set first, especially one that has an example in
the docs.
If you are still getting top marks, it may be your "framework" around the
code. (are you doing proper test/train splits, etc)
If it drops, consider that
I ran a few test based on Random Forest Classifier
(http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
with default setting. The classification (repeated the classification
procedure several times) is nearly 100% correct. That seems to be overfitting.
16 matches
Mail list logo