[Scikit-learn-general] OMP behavior

2011-10-17 Thread Alejandro Weinstein
Hi: I am observing a behavior of the scikit.learn implementation of OMP (sklearn.linear_model.orthogonal_mp) that I don't understand. I am performing the following experiment: - Generate a dictionary D (input data) with i.i.d. gaussian entries (with the column norm normalized to one) with dimensi

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Lars Buitinck
2011/10/18 Gael Varoquaux : > Thanks for everybody's feedback. I have taken it in account and wrote > some clean code rather than the original hack. The pull request > https://github.com/scikit-learn/scikit-learn/pull/398 > should be self-explained. In particular the hacks used should now be > comp

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Gael Varoquaux
Thanks for everybody's feedback. I have taken it in account and wrote some clean code rather than the original hack. The pull request https://github.com/scikit-learn/scikit-learn/pull/398 should be self-explained. In particular the hacks used should now be comprehensible. Could people review? I'd

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Lars Buitinck
2011/10/17 Gael Varoquaux : > On Mon, Oct 17, 2011 at 12:15:48PM +0200, Lars Buitinck wrote: >> If you really want to play this kind of tricks, then please use >> standard C functionality such as frexp() from and appropriate >> symbolic constants from . > > OK, I am not too good at this. Do you th

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Gael Varoquaux
On Mon, Oct 17, 2011 at 12:30:55PM +0200, Lars Buitinck wrote: > SSE optimizations don't work on processors without SSE. Again, the > Cell processor, UltraSPARC, ARM and what have you. This is impossible > to test thoroughly unless we get multiple buildbots running different > types of processor, a

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Gael Varoquaux
On Mon, Oct 17, 2011 at 12:15:48PM +0200, Lars Buitinck wrote: > -1 for the code in its current state; this is a potential maintenance > nightmare. The cast to int* violates C aliasing rules, so this might > break on agressively optimizing compilers. float and int are not > guaranteed to both the 3

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Lars Buitinck
2011/10/17 Olivier Grisel : > However the compilation flags might get tricky to get right i a cross > platform manner (also we would need to deal with memory alignment > stuff which are quite easy to get working on POSIX but I don't know > under windows). SSE alignment requires #ifdef magic to get

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Olivier Grisel
2011/10/17 Brian Holt : > +1 even though its not as accurate.  If the tests pass, then its accurate > enough IMHO. I am +1-ish too. Maybe we need additional tests to make sure it does not break in weird cases. Also if transcendental evaluations are expensive in other algorithms of the scikit, it

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Lars Buitinck
2011/10/17 Gael Varoquaux : > The question is: is it acceptable to have such an approximation? I think > so, I just wanted confirmation. If people agree with this, I'll document > it better (and maybe test it) and push to master. -1 for the code in its current state; this is a potential maintenanc

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Peter Prettenhofer
+1 - great speedup - thanks Gaël! 2011/10/17 Brian Holt : > +1 even though its not as accurate.  If the tests pass, then its accurate > enough IMHO. > > > > -- > All the data continuously generated in your IT infrastructu

Re: [Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Brian Holt
+1 even though its not as accurate. If the tests pass, then its accurate enough IMHO. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, sec

[Scikit-learn-general] Log approximation in tree entropy criterion

2011-10-17 Thread Gael Varoquaux
I timed a bit the Entropy criterion of classification tree construction. It appeared that the log (a transcendal function) was taking up a good fraction of the time). I coded a fast log approximation that is a bit brutal: https://github.com/GaelVaroquaux/scikit-learn/commit/05e707f8dd67eb65948da87

Re: [Scikit-learn-general] n_{i,j} in expected mutual information

2011-10-17 Thread Lars Buitinck
2011/10/17 Robert Layton : > In the formula for the expected value for mutual information [1], the third > summation uses n_{i,j}. > Is this a new value, or do I use the value from the contingency matrix? In Vinh, Epps and Bailey (2010 [1]), n_{i,j} is the contingency table. The Wikipedia page see

[Scikit-learn-general] n_{i,j} in expected mutual information

2011-10-17 Thread Robert Layton
In the formula for the expected value for mutual information [1], the third summation uses n_{i,j}. Is this a new value, or do I use the value from the contingency matrix? My thinking is that is a new value, as the expected information shouldn't have anything to do with the contingency matrix, but