Brian,
That is great to have query rules from a decision tree.
Back to the original question, is there any native way to make the decision
tree split following "gini or entropy" criterion, and satisfying the two
fraud detection conditions, with leaf nodes with *fraud_usd_leaf *>= 0.05,
and *frau
+1
On 28 August 2015 at 04:23, Andreas Mueller wrote:
> I think it would be fine to enable it now without support in all solvers.
>
>
> On 8/27/2015 11:29 AM, Valentin Stolbunov wrote:
>
> Joel, I see you've done some work in that PR. Is an additional review all
> that's needed there? Looks like
Rex,
For extracting decision rules as a Pandas query, here is some sample code
with a test case that should work. No promises though.
```
import pandas as pd
from sklearn.datasets import load_iris
from sklearn import tree
import sklearn
def get_queries(clf, feature_names):
def recurse(node_i
Hi Jocob,
That is cool! Very helpful.
In further, based on your idea, I can do a loop with random split and
automatically find those leaf nodes satisfying the two fraud detect
conditions.
Here is one raised question. How to extract the associated decision rules
to one selected leaf node?
Usuall
Hi Rex
I would set up the problem in the same way.
Look at http://scikit-learn.org/stable/modules/tree.html. The visualization
should be of use to you, where you can manually inspect good_usd_leaf and
fraud_usd_leaf.
If you want to do this automatically, you should look at clf.tree_.value(),
whi
Hi Jacob,
Let's consider one leaf node with three order transactions, one order is
good ($30), and the other two are fraud ($35 + $35 = $70 fraud in total).
The two class_weights are in equal weight, {'0':1, '1':1}, in which class
'0' labels good, and the class '1' labels a fraud. The two classes
I think it would be fine to enable it now without support in all solvers.
On 8/27/2015 11:29 AM, Valentin Stolbunov wrote:
Joel, I see you've done some work in that PR. Is an additional review
all that's needed there? Looks like changes in Logistic Regression CV
broke the original contribution
Hi Scikit-Learn fellas,
Here is my request for comments and contributions: as I have briefly presented
to Gael at OHBM, we (with Matteo, CCed) initiated a new project --
DueCredit to enable users quickly harvest necessary citations for the methods
and software they have used in their analyses.
Joel, I see you've done some work in that PR. Is an additional review all
that's needed there? Looks like changes in Logistic Regression CV broke the
original contribution and it has since stalled (over 1 year ago).
I guess the big question is: what is the best way to get sample weights in
LR? Wou
On Tue, Aug 25, 2015 at 01:06:11PM -0400, Andreas Mueller wrote:
> For speed regression tests, it has happened that things got slower, in
> particular with innocent looking things like input validation.
> I think it would be good to have some tests that ensure that we don't
> get too much slower.
Ok, thanks Joel, I understand that now. I'll just do my own bootstrapping
then.
Andrew
On Thu, Aug 27, 2015 at 4:10 PM, <
scikit-learn-general-requ...@lists.sourceforge.net> wrote:
> Send Scikit-learn-general mailing list submissions to
> scikit-learn-general@lists.sourceforge.net
>
> T
The randomisation only changes the order of the data, not the set of data
points.
On 27 August 2015 at 22:44, Andrew Howe wrote:
> I'm working through the tutorial, and also experimenting kind of on my
> own. I'm on the text analysis example, and am curious about the relative
> merits of analyz
Sorry for the red herring, but I've realized it's not an issue with
Pipeline. The code below has the same behavior:
nw = dat.datetime.now()
rndstat = nw.hour*3600+nw.minute*60+nw.second
twenty_train = fetch_20newsgroups(subset='train', categories=categories,
random_state = rndstat, shuffle=True,
I'm working through the tutorial, and also experimenting kind of on my
own. I'm on the text analysis example, and am curious about the relative
merits of analyzing by word frequency, relative frequency, and adjusted
relative frequency. Using the 20 newsgroups data, I've built a set of
pipelines w
Hi,Some time ago I made a PR on the K-SVD algorithm, but it didn't made to a commit.Now I have time to work on this once again, so I am asking for some reviews on the code.The link to the PR: https://github.com/scikit-learn/scikit-learn/pull/3739--Alexey Umnov
-
15 matches
Mail list logo