Re: [scikit-learn] What is the FeatureAgglomeration algorithm?

2018-07-25 Thread Raphael C
quot;. These are well > documented in the literature, or on wikipedia. > > Gaël > > On Thu, Jul 26, 2018 at 06:05:21AM +0100, Raphael C wrote: > > Hi, > > > I am trying to work out what, in precise mathematical terms, > > [FeatureAgglomeration][1] does and w

[scikit-learn] What is the FeatureAgglomeration algorithm?

2018-07-25 Thread Raphael C
Hi, I am trying to work out what, in precise mathematical terms, [FeatureAgglomeration][1] does and would love some help. Here is some example code: import numpy as np from sklearn.cluster import FeatureAgglomeration for S in ['ward', 'average', 'complete']: FA = FeatureAgglo

Re: [scikit-learn] Finding a single cluster in 1d data

2018-04-14 Thread Raphael C
n this approach, personally, I think the jenskpy module more > straightforward. > > I hope it helps. > > Pedro Pazzini > > 2018-04-12 16:22 GMT-03:00 Raphael C : >> >> I have a set of points in 1d represented by a list X of floating point >> numbers. The list has

[scikit-learn] Finding a single cluster in 1d data

2018-04-12 Thread Raphael C
I have a set of points in 1d represented by a list X of floating point numbers. The list has one dense section and the rest is sparse and I want to find the dense part. I can't release the actual data but here is a simulation: N = 100 start = 0 points = [] rate = 0.1 for i in range(N): point

Re: [scikit-learn] Parallel MLP version

2017-12-20 Thread Raphael C
I believe tensorflow will do what you want. Raphael On 20 Dec 2017 16:43, "Luigi Lomasto" wrote: > Hi all, > > I have a computational problem to training my neural network so, can you > say me if exists any parallel version about MLP library? > > > __

Re: [scikit-learn] Unclear help file about sklearn.decomposition.pca

2017-10-17 Thread Raphael C
How about including the scaling that people might want to use in the User Guide examples? Raphael On 17 October 2017 at 16:40, Andreas Mueller wrote: > In general scikit-learn avoids automatic preprocessing. > That's a convention to give the user more control and decrease surprising > behavior (

Re: [scikit-learn] Truncated svd not working for complex matrices

2017-08-11 Thread Raphael C
Although the first priority should be correctness (in implementation and documentation) and it makes sense to explicitly test for inputs for which code will give the wrong answer, it would be great if we could support complex data types, especially where it is very little extra work. Raphael On 1

Re: [scikit-learn] decision trees

2017-03-29 Thread Raphael C
There is https://github.com/scikit-learn/scikit-learn/pull/4899 . It looks like it is waiting for review? Raphael On 29 March 2017 at 11:50, federico vaggi wrote: > That's a really good point. Do you know of any systematic studies about the > two different encodings? > > Finally: wasn't there

Re: [scikit-learn] Markov Clustering?

2016-12-05 Thread Raphael C
I just needed to check with him that > indeed it was this specific algorithm). > > G > > On Sun, Dec 04, 2016 at 08:18:54AM +, Raphael C wrote: >> I think you get a better view of the importance of Markov Clustering in >> academia from https://scholar.google.co.uk/scho

Re: [scikit-learn] Markov Clustering?

2016-12-04 Thread Raphael C
I think you get a better view of the importance of Markov Clustering in academia from https://scholar.google.co.uk/scholar?hl=en&as_sdt=0,5&q=Markov+clustering . Raphael On Sat, 3 Dec 2016 at 22:43 Allan Visochek wrote: > Thanks for pointing that out, I sort of picked it up by word of mouth so

Re: [scikit-learn] Fwd: libmf bindings

2016-11-02 Thread Raphael C
(I am not a scikit learn dev.) This is a great idea and I for one look forward to using it. My understanding is that libmf optimises only over the observed values (that is the explicitly given values in a sparse matrix) as is typically needed for recommender system whereas the scikit learn NMF co

Re: [scikit-learn] Missing data and decision trees

2016-10-13 Thread Raphael C
You can simply make a new binary feature (per feature that might have a missing value) that is 1 if the value is missing and 0 otherwise. The RF can then work out what to do with this information. I don't know how this compares in practice to more sophisticated approaches. Raphael On Thursday,

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
s information but I am sure I must have misunderstood. At best it seems it could cover the number of positive values but this is missing half the information. Raphael > > On Mon, Oct 10, 2016 at 1:15 PM, Raphael C wrote: >> >> How do I use sample_weight for my use case? &

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
be the sample weight function in fit > > http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html > > On Mon, Oct 10, 2016 at 1:03 PM, Raphael C wrote: >> >> I just noticed this about the glm package in R. >> http://stats.stackex

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
of the last two options would do for me. Does scikit-learn support either of these last two options? Raphael On 10 October 2016 at 11:55, Raphael C wrote: > I am trying to perform regression where my dependent variable is > constrained to be between 0 and 1. This constraint comes from the fa

[scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
I am trying to perform regression where my dependent variable is constrained to be between 0 and 1. This constraint comes from the fact that it represents a count proportion. That is counts in some category divided by a total count. In the literature it seems that one common way to tackle this is

Re: [scikit-learn] Github project management tools

2016-09-29 Thread Raphael C
My apologies I see it is in the spreadsheet. It would be great to see this work finished for 0.19 if at all possible IMHO. Raphael On 29 September 2016 at 20:12, Raphael C wrote: > I hope this isn't out of place but I notice that > https://github.com/scikit-learn/scikit-learn/pull/

Re: [scikit-learn] Github project management tools

2016-09-29 Thread Raphael C
I hope this isn't out of place but I notice that https://github.com/scikit-learn/scikit-learn/pull/4899 is not in the list. It seems like a very worthwhile addition and the PR appears stalled at present. Raphael On 29 September 2016 at 15:05, Joel Nothman wrote: > I agree that being able to iden

[scikit-learn] How to get the factorization from NMF in scikit learn

2016-09-07 Thread Raphael C
I am trying to use NMF from scikit learn. Given a matrix A this should give me a factorization into matrices W and H so that WH is approximately equal to A. As a sanity check I tried the following: from sklearn.decomposition import NMF import numpy as np A = np.array([[0,1,0],[1,0,1],[1,1,0]]) nmf

Re: [scikit-learn] Gradient Boosting: Feature Importances do not sum to 1

2016-08-30 Thread Raphael C
Can you provide a reproducible example? Raphael On Wednesday, August 31, 2016, Douglas Chan wrote: > Hello everyone, > > I notice conditions when Feature Importance values do not add up to 1 in > ensemble tree methods, like Gradient Boosting Trees or AdaBoost Trees. I > wonder if there’s a bug

Re: [scikit-learn] Does NMF optimise over observed values

2016-08-29 Thread Raphael C
On Monday, August 29, 2016, Andreas Mueller wrote: > > > On 08/28/2016 01:16 PM, Raphael C wrote: > > > > On Sunday, August 28, 2016, Andy > wrote: > >> >> >> On 08/28/2016 12:29 PM, Raphael C wrote: >> >> To give a little context from t

Re: [scikit-learn] Does NMF optimise over observed values

2016-08-28 Thread Raphael C
On Sunday, August 28, 2016, Andy wrote: > > > On 08/28/2016 12:29 PM, Raphael C wrote: > > To give a little context from the web, see e.g. http://www.quuxlabs.com/ > blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation- > in-python/ where it explains: >

Re: [scikit-learn] Does NMF optimise over observed values

2016-08-28 Thread Raphael C
actly. Instead, we will only try to minimise the errors of the observed user-item pairs. " Raphael On Sunday, August 28, 2016, Raphael C wrote: > Thank you for the quick reply. Just to make sure I understand, if X is > sparse and n by n with X[0,0] = 1, X_[n-1, n-1]=0 explicitly set (th

Re: [scikit-learn] Does NMF optimise over observed values

2016-08-28 Thread Raphael C
- i.e. no mask in the loss function. > Le 28 août 2016 16:58, "Raphael C" > a écrit : > > What I meant was, how is the objective function defined when X is sparse? > > Raphael > > > On Sunday, August 28, 2016, Raphael C > wrote: > >> Reading

Re: [scikit-learn] Does NMF optimise over observed values

2016-08-28 Thread Raphael C
What I meant was, how is the objective function defined when X is sparse? Raphael On Sunday, August 28, 2016, Raphael C wrote: > Reading the docs for http://scikit-learn.org/stable/modules/generated/ > sklearn.decomposition.NMF.html it says > > The objective function is: > &

[scikit-learn] Does NMF optimise over observed values

2016-08-28 Thread Raphael C
Reading the docs for http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html it says The objective function is: 0.5 * ||X - WH||_Fro^2 + alpha * l1_ratio * ||vec(W)||_1 + alpha * l1_ratio * ||vec(H)||_1 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 + 0.5 * alpha * (1 - l1_r

Re: [scikit-learn] How to get the most important features from a RF efficiently

2016-07-21 Thread Raphael C
The problem was that I had a loop like for i in xrange(len(clf.feature_importances_)): print clf.feature_importances_[i] which recomputes the feature importance array in every step. Obvious in hindsight. Raphael On 21 July 2016 at 16:22, Raphael C wrote: > I have a set of feat

[scikit-learn] How to get the most important features from a RF efficiently

2016-07-21 Thread Raphael C
I have a set of feature vectors associated with binary class labels, each of which has about 40,000 features. I can train a random forest classifier in sklearn which works well. I would however like to see the most important features. I tried simply printing out forest.feature_importances_ but thi