Re: [Scikit-learn-general] How to extract the decision tree rule of each leaf node into Pandas Dataframe query?

2015-08-27 Thread Rex X
Brian, That is great to have query rules from a decision tree. Back to the original question, is there any native way to make the decision tree split following "gini or entropy" criterion, and satisfying the two fraud detection conditions, with leaf nodes with *fraud_usd_leaf *>= 0.05, and *frau

Re: [Scikit-learn-general] Turning on sample weights for linear_model.LogisticRegression

2015-08-27 Thread Joel Nothman
+1 On 28 August 2015 at 04:23, Andreas Mueller wrote: > I think it would be fine to enable it now without support in all solvers. > > > On 8/27/2015 11:29 AM, Valentin Stolbunov wrote: > > Joel, I see you've done some work in that PR. Is an additional review all > that's needed there? Looks like

Re: [Scikit-learn-general] How to extract the decision tree rule of each leaf node into Pandas Dataframe query?

2015-08-27 Thread Brian Scannell
Rex, For extracting decision rules as a Pandas query, here is some sample code with a test case that should work. No promises though. ``` import pandas as pd from sklearn.datasets import load_iris from sklearn import tree import sklearn def get_queries(clf, feature_names): def recurse(node_i

Re: [Scikit-learn-general] How to extract the decision tree rule of each leaf node into Pandas Dataframe query?

2015-08-27 Thread Rex X
Hi Jocob, That is cool! Very helpful. In further, based on your idea, I can do a loop with random split and automatically find those leaf nodes satisfying the two fraud detect conditions. Here is one raised question. How to extract the associated decision rules to one selected leaf node? Usuall

Re: [Scikit-learn-general] How to extract the decision tree rule of each leaf node into Pandas Dataframe query?

2015-08-27 Thread Jacob Schreiber
Hi Rex I would set up the problem in the same way. Look at http://scikit-learn.org/stable/modules/tree.html. The visualization should be of use to you, where you can manually inspect good_usd_leaf and fraud_usd_leaf. If you want to do this automatically, you should look at clf.tree_.value(), whi

Re: [Scikit-learn-general] How to extract the decision tree rule of each leaf node into Pandas Dataframe query?

2015-08-27 Thread Rex X
Hi Jacob, Let's consider one leaf node with three order transactions, one order is good ($30), and the other two are fraud ($35 + $35 = $70 fraud in total). The two class_weights are in equal weight, {'0':1, '1':1}, in which class '0' labels good, and the class '1' labels a fraud. The two classes

Re: [Scikit-learn-general] Turning on sample weights for linear_model.LogisticRegression

2015-08-27 Thread Andreas Mueller
I think it would be fine to enable it now without support in all solvers. On 8/27/2015 11:29 AM, Valentin Stolbunov wrote: Joel, I see you've done some work in that PR. Is an additional review all that's needed there? Looks like changes in Logistic Regression CV broke the original contribution

[Scikit-learn-general] RFCC: duecredit citations for sklearn (and anything else you like ; ) )

2015-08-27 Thread Yaroslav Halchenko
Hi Scikit-Learn fellas, Here is my request for comments and contributions: as I have briefly presented to Gael at OHBM, we (with Matteo, CCed) initiated a new project -- DueCredit to enable users quickly harvest necessary citations for the methods and software they have used in their analyses.

Re: [Scikit-learn-general] Turning on sample weights for linear_model.LogisticRegression

2015-08-27 Thread Valentin Stolbunov
Joel, I see you've done some work in that PR. Is an additional review all that's needed there? Looks like changes in Logistic Regression CV broke the original contribution and it has since stalled (over 1 year ago). I guess the big question is: what is the best way to get sample weights in LR? Wou

Re: [Scikit-learn-general] Tests against reference implementations, speed regression tests

2015-08-27 Thread Gael Varoquaux
On Tue, Aug 25, 2015 at 01:06:11PM -0400, Andreas Mueller wrote: > For speed regression tests, it has happened that things got slower, in > particular with innocent looking things like input validation. > I think it would be good to have some tests that ensure that we don't > get too much slower.

[Scikit-learn-general] issue with pipeline always giving same results

2015-08-27 Thread Andrew Howe
Ok, thanks Joel, I understand that now. I'll just do my own bootstrapping then. Andrew On Thu, Aug 27, 2015 at 4:10 PM, < scikit-learn-general-requ...@lists.sourceforge.net> wrote: > Send Scikit-learn-general mailing list submissions to > scikit-learn-general@lists.sourceforge.net > > T

Re: [Scikit-learn-general] issue with pipeline always giving same results

2015-08-27 Thread Joel Nothman
The randomisation only changes the order of the data, not the set of data points. On 27 August 2015 at 22:44, Andrew Howe wrote: > I'm working through the tutorial, and also experimenting kind of on my > own. I'm on the text analysis example, and am curious about the relative > merits of analyz

Re: [Scikit-learn-general] issue with pipeline always giving same results (Andrew Howe)

2015-08-27 Thread Andrew Howe
Sorry for the red herring, but I've realized it's not an issue with Pipeline. The code below has the same behavior: nw = dat.datetime.now() rndstat = nw.hour*3600+nw.minute*60+nw.second twenty_train = fetch_20newsgroups(subset='train', categories=categories, random_state = rndstat, shuffle=True,

[Scikit-learn-general] issue with pipeline always giving same results

2015-08-27 Thread Andrew Howe
I'm working through the tutorial, and also experimenting kind of on my own. I'm on the text analysis example, and am curious about the relative merits of analyzing by word frequency, relative frequency, and adjusted relative frequency. Using the 20 newsgroups data, I've built a set of pipelines w

Re: [Scikit-learn-general] K-SVD implementation

2015-08-27 Thread Alexey Umnov
Hi,Some time ago I made a PR on the K-SVD algorithm, but it didn't made to a commit.Now I have time to work on this once again, so I am asking for some reviews on the code.The link to the PR: https://github.com/scikit-learn/scikit-learn/pull/3739--Alexey Umnov -