Re: [Scikit-learn-general] [Matplotlib-users] Scipy2016: call for proposals

2016-03-08 Thread Kyle Kastner
I am on the fence still - internship this summer so I need to check on timing/vacation expectation On Mon, Mar 7, 2016 at 3:09 PM, Jacob Vanderplas wrote: > I'm not going to be able to make it this year, unfortunately. > Jake > > Jake VanderPlas > Senior Data Science Fellow > Director of Re

Re: [Scikit-learn-general] scikit-learn in Julia

2016-03-07 Thread Kyle Kastner
Is julia-learn a thing already? Juliasklearn seems a bit overloaded to me, but naming things is hard. On Mon, Mar 7, 2016 at 11:02 AM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > On Mon, Mar 07, 2016 at 10:54:53AM -0500, Andreas Mueller wrote: > > I'm not sure about the naming of the

Re: [Scikit-learn-general] GSoC Project Proposal: Reinforcement Learning Module

2016-03-02 Thread Kyle Kastner
Any RL package will have be heavily focused on non-iid data (timeseries, basically) with the additional difficulty of the agent effecting/interacting with the environment it is operating in. I agree with you Gael - many packages for "deep learning" also don't handle this type of data/these models (

Re: [Scikit-learn-general] circle ci access and setup

2016-03-02 Thread Kyle Kastner
When I was (and still am, sometimes) hacking Circle CI support for sklearn-theano (https://github.com/sklearn-theano/sklearn-theano/pull/93) it had an option to have access only to 1 project. I have been debugging via logs, but there must be a better way, cause it is really a pain in the neck to d

Re: [Scikit-learn-general] Efficiency of Incremental PCA when n_components>>0

2015-10-14 Thread Kyle Kastner
IncrementalPCA should get closer to "true" PCA as the number of components increases - so if anything the solution should be more stable rather than less. The difference mostly lies in the incremental processing - regular PCA with reduced components performs the full PCA, then only keeps a subset o

Re: [Scikit-learn-general] PyCon 2016 scikit-learn tutorial

2015-10-05 Thread Kyle Kastner
I did a piece of that in the Titanic examples from the SciPy tutorial, but it could definitely use a more thorough and clear example. This version could probably be simplified/streamlined - much of my preprocessing was done with straight numpy, and I am 90% sure there is a more "sklearn approved" w

Re: [Scikit-learn-general] PyCon 2016 scikit-learn tutorial

2015-09-30 Thread Kyle Kastner
If people are planning to work on this, it would be good to check what Andy and I presented at SciPy, which is based on what Jake and Olivier did at PyCon (and what Andy, Jake and Gael did at SciPy 2013, etc. etc.). To Sebastian's points - we covered all of these nearly verbatim except perhaps cla

Re: [Scikit-learn-general] new commiters

2015-09-23 Thread Kyle Kastner
Congratulations - well deserved, and thanks for all your hard work! On Wed, Sep 23, 2015 at 6:47 AM, Arnaud Joly wrote: > Congratulation and welcome !!! > > Arnaud > > >> On 23 Sep 2015, at 08:59, Gael Varoquaux >> wrote: >> >> Welcome to the team. You've been doing awesome work. We are very lo

Re: [Scikit-learn-general] does sklearn rbm scale well with sparse high dimensional features

2015-07-27 Thread Kyle Kastner
ice for a larger number of > parameters like RBM but it would also involve MCMC iterations. Any > thoughts? > > On Mon, Jul 27, 2015 at 6:18 AM, Kyle Kastner > wrote: > >> RBMs are a factorization of a generally intractable problem - as you >> mention it is still

Re: [Scikit-learn-general] does sklearn rbm scale well with sparse high dimensional features

2015-07-27 Thread Kyle Kastner
RBMs are a factorization of a generally intractable problem - as you mention it is still O(n**2) but much better than the combinatorial brute force thing that the RBM factorization replaces. There might be faster RBM algorithms around but I don't know of any faster implementations that don't use GP

Re: [Scikit-learn-general] clustering large data sets

2015-06-18 Thread Kyle Kastner
Another citation for hebbian approach - it is related to this http://onlinelibrary.wiley.com/doi/10.1207/s15516709cog0901_5/pdf On Thu, Jun 18, 2015 at 10:25 AM, Kyle Kastner wrote: > Yes agreed - though I would also guess the intermediate memory blowup > could help speed, though I h

Re: [Scikit-learn-general] clustering large data sets

2015-06-18 Thread Kyle Kastner
ger is that > it doesn't do fancy indexing and avoids large intermediate arrays. > > > > On 06/18/2015 10:09 AM, Kyle Kastner wrote: > > I don't know if it is faster or better - but the learning rule is > insanely simple and it is hard to believe there could be

Re: [Scikit-learn-general] clustering large data sets

2015-06-18 Thread Kyle Kastner
You can also see the kmeans version it here: https://github.com/kastnerkyle/ift6268h15/blob/master/hw3/color_kmeans_theano.py#L23 Though I guarantee nothing about my homework code! On Thu, Jun 18, 2015 at 10:09 AM, Kyle Kastner wrote: > I don't know if it is faster or better - but the

Re: [Scikit-learn-general] clustering large data sets

2015-06-18 Thread Kyle Kastner
t might be "too easy" to have a real paper. On Thu, Jun 18, 2015 at 9:58 AM, Andreas Mueller wrote: > > > On 06/18/2015 09:48 AM, Kyle Kastner wrote: > > This link should work http://www.cs.toronto.edu/~rfm/code.html > > <http://www.cs.toronto.edu/%7Erfm/code.ht

Re: [Scikit-learn-general] clustering large data sets

2015-06-18 Thread Kyle Kastner
This link should work http://www.cs.toronto.edu/~rfm/code.html On Thu, Jun 18, 2015 at 9:38 AM, Kyle Kastner wrote: > Minibatch K-means should work just fine. Alternatively there are hebbian > K-means approaches which are quite easy to implement and should be fast > (though I

Re: [Scikit-learn-general] clustering large data sets

2015-06-18 Thread Kyle Kastner
Minibatch K-means should work just fine. Alternatively there are hebbian K-means approaches which are quite easy to implement and should be fast (though I think it basically boils down to minibatch K-means, I haven't looked at details of minibatch K-means). There is an approach here http://www.iro.

Re: [Scikit-learn-general] Dramatic improvement by standardizing data?

2015-04-29 Thread Kyle Kastner
Data preprocessing is important. One thing you might want to do is get your preprocessing scaling values over the training data - technically getting the value over the whole dataset is not valid as that includes the test data. It is hard to say whether 100% is believable or not, but you should pr

Re: [Scikit-learn-general] Robust PCA

2015-04-16 Thread Kyle Kastner
line so it is more specific >> than "Re: Contents of Scikit-learn-general digest..." >> >> >> Today's Topics: >> >>1. Re: Scikit-learn-general Digest, Vol 63, Issue 34 >> (Al

Re: [Scikit-learn-general] Robust PCA

2015-04-15 Thread Kyle Kastner
IF it was in scipy would it be backported to the older versions? How would we handle that? On Wed, Apr 15, 2015 at 3:40 PM, Olivier Grisel wrote: > We could use PyPROPACK if it was contributed upstream in scipy ;) > > I know that some scipy maintainers don't appreciate arpack much and > would lik

Re: [Scikit-learn-general] Robust PCA

2015-04-15 Thread Kyle Kastner
dit your Subject line so it is more specific > than "Re: Contents of Scikit-learn-general digest..." > > > Today's Topics: > >1. Re: pydata (Andreas Mueller) >2. Robust PCA (Andreas Mueller) >3. Re: Robust PCA

Re: [Scikit-learn-general] Robust PCA

2015-04-15 Thread Kyle Kastner
Robust PCA is awesome - I would definitely like to see a good and fast version. I had a version once upon a time, but it was neither good *or* fast :) On Wed, Apr 15, 2015 at 10:33 AM, Andreas Mueller wrote: > Hey all. > Was there some plan to add Robust PCA at some point? I vaguely remember > a

Re: [Scikit-learn-general] Artificial Neural Networks

2015-04-07 Thread Kyle Kastner
I have a simple nesterov momentum in Theano modified from some code Yann Dauphin had, here: https://github.com/kastnerkyle/ift6266h15/blob/master/normalized_convnet.py#L164 On Tue, Apr 7, 2015 at 10:44 AM, Andreas Mueller wrote: > Actually Olivier and me added some things to the MLP since then, a

Re: [Scikit-learn-general] scikit-learn.org down

2015-03-30 Thread Kyle Kastner
Just FYI, someday when the Github move happens there are a few tweaks to the build that will have to happen - Github has some special rules on folder names. I had to make some mods to the build for sklearn-theano, it wasn't awful but took a while to figure out On Mon, Mar 30, 2015 at 2:24 PM, Andr

Re: [Scikit-learn-general] [ANN] scikit-learn 0.16.0 is out!

2015-03-27 Thread Kyle Kastner
Awesome! Congratulations all who contributed to this - lots of great stuff. On Fri, Mar 27, 2015 at 12:26 PM, Olivier Grisel wrote: > Release highlights and full changelog available at: > > http://scikit-learn.org/0.16/whats_new.html > > You can grab it from the source here: > > https://pypi.pyth

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-25 Thread Kyle Kastner
e good results. > > Christof > > On 20150324 21:01, Kyle Kastner wrote: >> It might be nice to talk about optimizing runtime and/or training time >> like SMAC did in their paper. I don't see any reason we couldn't do >> this in sklearn, and it might be of v

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Kyle Kastner
makes sense). > > Btw, this paper has a couple of references for more detailed equations: > http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-449.pdf > > > On 03/25/2015 03:20 PM, Kyle Kastner wrote: >> There was mention of TDP (blocked Gibbs higher up in the paper) vs >> colla

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Kyle Kastner
eas Mueller wrote: >> >> >> >> On 03/24/2015 09:44 PM, Kyle Kastner wrote: >> > >> > Will users be allowed to set/tweak the burn-in and lag for the sampler >> > in the DPGMM? >> > >> This is variational! >> >> >> ---

Re: [Scikit-learn-general] Fwd: Trouble when compiling with MKL

2015-03-24 Thread Kyle Kastner
to see what happens. > > -- > João Felipe Santos > > On 24 March 2015 at 20:25, Kyle Kastner wrote: >> >> How did you install it? >> python setup.py develop or install? Did you have to use --user? >> >> On Tue, Mar 24, 2015 at 7:41 PM, João Felipe Santos &g

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-24 Thread Kyle Kastner
I like the fact that this can broken into nice parts. I also think documentation should be farther up the list, and math part lumped in. GMM cleanup should probably start out of the gate, as fixing that will define what API/init changes have to stay consistent in the other two models. Is there any

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Kyle Kastner
I would focus on the API of this functionality and how/what users will be allowed to specify. To me, this is a particularly tricky bit of the PR. As Vlad said, take a close look at GridSearchCV and RandomizedSearchCV and see how they interact with the codebase. Do you plan to find good defaults for

Re: [Scikit-learn-general] Fwd: Trouble when compiling with MKL

2015-03-24 Thread Kyle Kastner
How did you install it? python setup.py develop or install? Did you have to use --user? On Tue, Mar 24, 2015 at 7:41 PM, João Felipe Santos wrote: > Hi, > > I am using MKL with Numpy and Scipy on a cluster and just installed > scikit-learn. The setup process goes without any issue, but if I try t

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Kyle Kastner
ue, Mar 24, 2015 at 5:08 PM, Kyle Kastner wrote: > That said, I would think random forests would get a lot of the > benefits that deep learning tasks might get, since they also have a > lot of hyperparameters. Boosting tasks would be interesting as well, > since swapping the estimator

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Kyle Kastner
implement. On Tue, Mar 24, 2015 at 5:01 PM, Kyle Kastner wrote: > It might be nice to talk about optimizing runtime and/or training time > like SMAC did in their paper. I don't see any reason we couldn't do > this in sklearn, and it might be of value to users since we don't &

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Kyle Kastner
It might be nice to talk about optimizing runtime and/or training time like SMAC did in their paper. I don't see any reason we couldn't do this in sklearn, and it might be of value to users since we don't really do deep learning as Andy said. On Tue, Mar 24, 2015 at 4:52 PM, Andy wrote: > On 03/2

Re: [Scikit-learn-general] Decision Jungles: A possible GSOC 15 topic

2015-03-12 Thread Kyle Kastner
I am also interested in Mondrian Forests (and partial_fit methods for things in general), though I thought one of the issues for implementing either of these methods was the way our trees are currently built would make it hard to extend to these two algorithms. It is definitely important not to reg

Re: [Scikit-learn-general] SciPy 2015 Austin

2015-03-11 Thread Kyle Kastner
We can probably also email one of the organizers (I think they are listed on the site?) and find out if we can edit or add an addendum. It is strange - I am almost 100% positive we could edit the proposals in past years. Kyle On Wed, Mar 11, 2015 at 10:22 AM, Andreas Mueller wrote: > Unfortunate

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-07 Thread Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the best every time, a 90% solution for 10% of the complexity would be awesome. I think GPs with parameter space warping are *probably* the best solution but only a good implementation will show for sure. Spearmint and hyperopt exist

Re: [Scikit-learn-general] SciPy 2015 Austin

2015-03-07 Thread Kyle Kastner
s are one hour for the first part and two hours for > > the second part. > > Was the rest exercises or just not recorded? > > > Cheers, > > Andreas > > > > On 02/25/2015 09:21 PM, Kyle Kastner wrote: > > > That is a great idea. We should definite

Re: [Scikit-learn-general] Releasing 0.16

2015-03-03 Thread Kyle Kastner
I added some +1 to #4234 and #4325 . Surprised the RBM one exists! That is something that seems to happen a lot with those types of models and can be tricky to find. On Tue, Mar 3, 2015 at 2:16 PM, Olivier Grisel wrote: > Hi all, > > We are a bit late on the initial 0.16 beta release schedule bec

Re: [Scikit-learn-general] SciPy 2015 Austin

2015-03-01 Thread Kyle Kastner
first part and two hours for > the second part. > Was the rest exercises or just not recorded? > > Cheers, > Andreas > > > On 02/25/2015 09:21 PM, Kyle Kastner wrote: > > That is a great idea. We should definitely get a list of people who > > are attending and t

Re: [Scikit-learn-general] SciPy 2015 Austin

2015-02-25 Thread Kyle Kastner
ial days (but I hopefully will make it for the main > conference), >Jake > > Jake VanderPlas > Director of Research – Physical Sciences > eScience Institute, University of Washington > http://www.vanderplas.com > > On Wed, Feb 25, 2015 at 2:38 PM, Kyle Kastner wr

Re: [Scikit-learn-general] SciPy 2015 Austin

2015-02-25 Thread Kyle Kastner
I am working on one now. Hoping to go even if rejected, for sprint and meeting up On Wed, Feb 25, 2015 at 9:51 AM, Andy wrote: > Hey everybody. > Is anyone going to / submitting talks to scipy? > My institute (or rather Moore-Sloan) is a sponsor so they'll sent me :) > > Cheers, > Andy > > -

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-12 Thread Kyle Kastner
There are a lot of ways to speed them up as potential work, but the interface (and backend code) should be very stable first. Gradient based, latent variable approximation, low-rank updating, and distributed GP (new paper from a few weeks ago) are all possible, but would need to be compared to a ve

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-11 Thread Kyle Kastner
GSoC wise it might also be good to look at CCA, PLS etc. for cleanup. On Feb 12, 2015 2:02 AM, "Kyle Kastner" wrote: > Plugin vs separate package: > libsvm/liblinear are plugins whereas "friend" libraries like lightning are > packages right? > > By that defini

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-11 Thread Kyle Kastner
Plugin vs separate package: libsvm/liblinear are plugins whereas "friend" libraries like lightning are packages right? By that definition I agree with Gael - standalone packages are best for that stuff. I don't really know what a "plugin" for sklearn would be exactly. On Feb 12, 2015 1:58 AM, "Gae

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-11 Thread Kyle Kastner
wrote: > no i mean external plugin that they have to support - we're hands off. we > can link to it but that's it - no other guarantees like we've done in the > past iirc > > On Thu, Feb 12, 2015 at 1:48 AM, Kyle Kastner > wrote: > >> Even having a s

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-11 Thread Kyle Kastner
let other packages focus on that. On Feb 12, 2015 1:48 AM, "Kyle Kastner" wrote: > Even having a separate plugin will require a lot of maintenance. I am -1 > on any gpu stuff being included directly in sklearn. Maintenance for > sklearn is already tough, and trying to su

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-11 Thread Kyle Kastner
Even having a separate plugin will require a lot of maintenance. I am -1 on any gpu stuff being included directly in sklearn. Maintenance for sklearn is already tough, and trying to support a huge amount of custom compute hardware is really, really hard. Ensuring numerical stability between OS/BLAS

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-11 Thread Kyle Kastner
pylearn2 is not even close to sklearn compatible. Small scale recurrent nets are in PyBrain, but I really think that any seriously usable neural net type learners are sort of outside the scope of sklearn. Others might have different opinions, but this is one of the reasons Michael and I started skl

Re: [Scikit-learn-general] Calculating standard deviation for k-fold cross validation estimate

2015-02-05 Thread Kyle Kastner
Could it also be accounting for +- ? Standard deviation is one sided right? On Thu, Feb 5, 2015 at 4:54 PM, Joel Nothman wrote: > With cv=5, only the training sets should overlap. Is this adjustment still > appropriate? > > On 6 February 2015 at 06:44, Michael Eickenberg < > michael.eickenb...@g

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-05 Thread Kyle Kastner
IncrementalPCA is done (have to add randomized SVD solver but that should be simple), but I am sure there are other low rank methods which need a partial_fit . I think adding partial_fit functions in general to as many algorithms as possible would be nice Kyle On Thu, Feb 5, 2015 at 2:12 PM, Aksh

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-05 Thread Kyle Kastner
I think most of the GP related work is deciding what the sklearn compatible interface should be :) specifically how to handle kernels and try to share with core codebase. The HODLR solver of George could be very nice for scalibility but algorithm is not easy. There are a few other options on that

Re: [Scikit-learn-general] Adding Barnes-Hut t-SNE

2014-12-24 Thread Kyle Kastner
Sounds like an excellent improvement for usability! If you could benchmark time spent, and show that it is a noticeable improvement that will be crucial. Also showing how bad the approximation is compared to base t-SNE will be important - though there comes a point where you can't really compare,

Re: [Scikit-learn-general] Question about the FactorAnalysis implementation

2014-12-05 Thread Kyle Kastner
I haven't looked closely, but is Barber's data format considered to be examples as columns, or examples as rows? That difference is usually what I see in a bunch of different SVD based algorithms. It is very annoying when reading the literature. aka. what Michael said On Fri, Dec 5, 2014 at 10:2

Re: [Scikit-learn-general] pairwise.cosine_similarity(...) takes sparse inputs but forces a dense output?

2014-11-27 Thread Kyle Kastner
1.3M x1.3M which should blow up most current memory > sizes for all data types. Sparsity in the output can redeem this. > > On Thursday, November 27, 2014, Kyle Kastner > wrote: > >> On a side note, I am semi-surprised that allowing the output of the dot >> to be sparse &

Re: [Scikit-learn-general] pairwise.cosine_similarity(...) takes sparse inputs but forces a dense output?

2014-11-27 Thread Kyle Kastner
On a side note, I am semi-surprised that allowing the output of the dot to be sparse "just worked" without crashing the rest of it... On Thu, Nov 27, 2014 at 12:19 PM, Kyle Kastner wrote: > If your data is really, really sparse in the original space, you might > also look at

Re: [Scikit-learn-general] pairwise.cosine_similarity(...) takes sparse inputs but forces a dense output?

2014-11-27 Thread Kyle Kastner
If your data is really, really sparse in the original space, you might also look at taking a random projection (I think projecting on sparse SVD basis would work too?) as preprocessing to "densify" the data before calling the cosine similarity. You might get a win on feature size with this, dependi

Re: [Scikit-learn-general] GPs in sklearn

2014-11-25 Thread Kyle Kastner
Gradient based optimization (I think this might be related to the recent variational methods for GPs) would be awesome. On Tue, Nov 25, 2014 at 12:54 PM, Mathieu Blondel wrote: > > > On Wed, Nov 26, 2014 at 2:37 AM, Andy wrote: > >> >> What I think would be great to have is gradient based optim

Re: [Scikit-learn-general] GPs in sklearn

2014-11-25 Thread Kyle Kastner
For the API, the naming is somewhat non-standard, it is not super clear > what the parameters mean, and it is also not super clear > whether the kernel-parameters will be optimized for a given parameters > setting. > > > > > On 11/25/2014 12:28 PM, Gael Varoquaux wrote: &g

Re: [Scikit-learn-general] GPs in sklearn

2014-11-25 Thread Kyle Kastner
I have some familiarity with the GP stuff in sklearn, but one of the big things I really *want* is something much more like George - specifically a HODLR solver. Maybe it is outside the scope of the project, but I think GPs in sklearn could be very useful and computationally tractable for "big-ish"

Re: [Scikit-learn-general] NIPS

2014-11-18 Thread Kyle Kastner
I will be there for everything - glad to meet up before, during, and after! Be warned it already started snowing here and is pretty cold... feels like -10 C today according to weather.com. On Tue, Nov 18, 2014 at 11:40 AM, Andy wrote: > Hey. > > I'll be there and talking at the machine learning

Re: [Scikit-learn-general] Question about KernelDensity implementation

2014-11-05 Thread Kyle Kastner
In addition to the y=None thing, KDE doesn't have a transform or predict method - and I don't think Pipeline supports score or score_samples. Maybe someone can comment on this, but I don't think KDE is typically used in a pipeline. In this particular case the code *seems* reasonable (and I am surp

Re: [Scikit-learn-general] Welcome new core contributors

2014-10-12 Thread Kyle Kastner
contributing more. >>> >>> On Sun, Oct 12, 2014 at 5:24 PM, Gael Varoquaux >>> wrote: >>>> >>>> I am happy to welcome new core contributors to scikit-learn: >>>> - Alexander Fabisch (@AlexanderFabisch) >>>> - Kyle Kas

Re: [Scikit-learn-general] sklearn on CentOS

2014-09-25 Thread Kyle Kastner
To be honest - updating python packages on CentOS is a nightmare. The whole OS is pretty strongly dependent on python version, which I believe is up to 2.6 now (2.4 in 5.x!). In my experience CentOS is the worst Linux OS for development (heavily locked down, hard to add packages, yum is annoying, e

Re: [Scikit-learn-general] K-SVD implementation

2014-09-23 Thread Kyle Kastner
I started some code here long ago (https://gist.github.com/kastnerkyle/8143030) that isn't really finished or cleaned up - maybe it can give you some ideas/advice for implementing? I never got a chance to clean this up for PR, and it doesn't look like I will have time in the near future so your PR

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-08 Thread Kyle Kastner
I agree as well. Maybe default to everything other than validation private? Then see what people want to become public? Don't know what nilearn is using but that should obviously be public too... On Mon, Sep 8, 2014 at 5:17 PM, Olivier Grisel wrote: > +1 as well for the combined proposal of Gael

Re: [Scikit-learn-general] Starting to contribute to Scikit-learn.

2014-09-06 Thread Kyle Kastner
Shubham, There are many open improvements on the GitHub issues list (https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aopen+is%3Aissue+label%3AEasy). I recommend starting with a few of the Easy or Documentation tasks - it helps get the workflow down and is also very valuable to the projec

[Scikit-learn-general] Changing background colors for website

2014-09-05 Thread Kyle Kastner
I have copped and heavily modified the website for use on another project ( https://github.com/sklearn-theano/sklearn-theano), and I would like to change the color scheme. Does anyone with experience modifying the website know how to do this? Is there an easier way besides hand modifying the css? M

Re: [Scikit-learn-general] Multivariate Bernoulli EM

2014-08-28 Thread Kyle Kastner
This sounds interesting - what do you normally use it for? Do you have any references for papers to look at? On Wed, Aug 27, 2014 at 12:44 PM, Mark Stoehr wrote: > Hi scikit-learn, > > Myself and a colleague put together an implementation of the EM algorithm > for mixtures of multivariate Bernoul

Re: [Scikit-learn-general] GSoC - Review LSHForest approximate nearest neighbor search implementation

2014-08-06 Thread Kyle Kastner
As far as I know, the typical idea is to keep things as readable as possible, and only optimize the "severe/obvious" type bottlenecks (things like memory explosions, really bad algorithmic complexity, unnecessary data copy, etc). I can't really comment on your "where do the bottlenecks go" questio

Re: [Scikit-learn-general] Equivalence between PCA and SVD

2014-07-31 Thread Kyle Kastner
I did not see your earlier script... now I am interested. I have been hacking on it but don't know what is going on yet. On Thu, Jul 31, 2014 at 4:25 PM, Deepak Pandian wrote: > On Thu, Jul 31, 2014 at 7:49 PM, Kyle Kastner > wrote: > > It looks like the transpose may make

Re: [Scikit-learn-general] Equivalence between PCA and SVD

2014-07-31 Thread Kyle Kastner
It looks like the transpose may make the system under-determined. If you try with X = np.random.randn(*X.shape) What happens? On Thu, Jul 31, 2014 at 4:17 PM, Kyle Kastner wrote: > What is the shape of X > > > On Thu, Jul 31, 2014 at 4:14 PM, Deepak Pandian > wrote: >

Re: [Scikit-learn-general] Equivalence between PCA and SVD

2014-07-31 Thread Kyle Kastner
What is the shape of X On Thu, Jul 31, 2014 at 4:14 PM, Deepak Pandian wrote: > On Thu, Jul 31, 2014 at 7:31 PM, Olivier Grisel > wrote: > > The sign of the components is not deterministic. The absolute values > > should be the same. > > But the last component differs even in absolute values,

Re: [Scikit-learn-general] metric in neighbors classifier

2014-07-10 Thread Kyle Kastner
OK - what is the result of X.shape and X.dtype? What is X? On Thu, Jul 10, 2014 at 1:55 PM, Sheila the angel wrote: > Yes, the error is in fit(X,y) > > clf.fit(X,y) > > --- > Traceback (most recent call last): > > File ""

Re: [Scikit-learn-general] metric in neighbors classifier

2014-07-10 Thread Kyle Kastner
What was the error? Posting a traceback would help us help you. On Thu, Jul 10, 2014 at 11:45 AM, Sheila the angel wrote: > What is the correct way to use different metric in KNeighborsClassifier ? > > I tried this > > clf = KNeighborsClassifier(metric="mahalanobis").fit(X, y) > > which give

Re: [Scikit-learn-general] Sample weighting in RandomizedSearchCV

2014-07-08 Thread Kyle Kastner
It looks like fit_params are passed wholesale to the classifier being fit - this means the sample weights will be a different size than the fold of (X, y) fed to the classifier (since the weights aren't getting KFolded...). Unfortunately I do not see a way to accomodate for this currently - sample_

Re: [Scikit-learn-general] Problem with the online documentation/website!!!

2014-07-07 Thread Kyle Kastner
s kind of stuff... hopefully someone who knows the web server can help. On Mon, Jul 7, 2014 at 12:05 PM, Nelle Varoquaux wrote: > > > > On 7 July 2014 10:01, Kyle Kastner wrote: > >> If by symlink you are talking about Linux symlinks (dunno if there are >> others)

Re: [Scikit-learn-general] An advise about suitable datasets

2014-07-07 Thread Kyle Kastner
I have been doing a lot of work on tensors, and there are many different datasets which have this property. Places to look are image tracking, tensor decomposition, multi-way analysis, chemisty(!), physics, etc. A quick list of some sites I have put together: http://three-mode.leidenuniv.nl/ http:

Re: [Scikit-learn-general] Problem with the online documentation/website!!!

2014-07-07 Thread Kyle Kastner
If by symlink you are talking about Linux symlinks (dunno if there are others), you can do: ln -sf source dest i.e. ln -sf /path/to/0.14/docs current_place to force update it, but buyer beware. On Mon, Jul 7, 2014 at 11:11 AM, Nelle Varoquaux wrote: > Hello everyone, > > A couple of months

Re: [Scikit-learn-general] Regarding partial_fit in naive_bayes

2014-07-04 Thread Kyle Kastner
You should probably read the paper: Training Highly Multiclass Classifiers http://jmlr.org/papers/v15/gupta14a.html That said, I think you could gain a lot of value by looking into hierarchical approaches - training a bunch of small classifiers on subsets of the overall data to subselect the "righ

Re: [Scikit-learn-general] encoding label using custom target

2014-07-03 Thread Kyle Kastner
The easiest way is to just map them yourself with some Python code after LabelEncoder - this type of mapping is generally application specific. Something like: a[a == 0] = 100 a[a == 1] = 150 a[a == 2] = 155 will do the trick. For many labels, you could loop through a dictionary you make and set

Re: [Scikit-learn-general] PCA inverse transform

2014-06-30 Thread Kyle Kastner
Is this necessary for new PCA methods as well? In other words, should I add an already deprecated constructor arg to IncrementalPCA as well, or just do the whitening inverse_transform the way it will be done in 0.16 and on? On Mon, Jun 30, 2014 at 3:20 PM, Gael Varoquaux < gael.varoqu...@normales

Re: [Scikit-learn-general] The right way to reduce dimension in Pipeline tied with estimator.

2014-06-21 Thread Kyle Kastner
You need to set n_components to something smaller than n_features (where n_features is X.shape[1] if X is your data matrix) for PCA - by default it does not drop any components, and simply projects to another space. A lot of examples use n_components=2, then do a scatter plot to see the separation

Re: [Scikit-learn-general] Scikit-Learn sprint 2014 - July in Paris

2014-06-20 Thread Kyle Kastner
Sent you an email - I know of at least one possibility. On Fri, Jun 20, 2014 at 2:46 PM, Andy wrote: > Hey Everyone. > > Does anyone by any chance have a spare bed / couch? > > Cheers, > Andy > > On 06/08/2014 01:47 PM, Alexandre Gramfort wrote: > > hi everyone, > > > > time to reactivate this

Re: [Scikit-learn-general] Instance Reduction on scikit-learn

2014-06-18 Thread Kyle Kastner
Do you have any references for this technique? What is it typically used for? On Wed, Jun 18, 2014 at 12:26 PM, Dayvid Victor wrote: > Hi there, > > Is anybody working on an Instance Reduction module for sklearn? > > I started working on those and I already have more than 10 IR (PS and PG) > al

Re: [Scikit-learn-general] PCA nipals and Sparse PCA

2014-06-05 Thread Kyle Kastner
ange/18760-partial-least-squares-and-discriminant-analysis/content/html/learningpcapls.html#3 for more details On Thu, Jun 5, 2014 at 12:04 PM, Kyle Kastner wrote: > I am planning to work on NIPALS after the 0.15 sklearn release - there > are several good papers I will try to work wi

Re: [Scikit-learn-general] PCA nipals and Sparse PCA

2014-06-05 Thread Kyle Kastner
I am planning to work on NIPALS after the 0.15 sklearn release - there are several good papers I will try to work with and implement. Simple, high level description: http://www.vias.org/tmdatanaleng/dd_nipals_algo.html Simple MATLAB (will start with this first likely): http://www.cdpcenter.org/fi

Re: [Scikit-learn-general] Anyone experience hanging when parallelizing fits?

2014-05-27 Thread Kyle Kastner
;ve definitely run regressions > with larger matrices in the past, and haven't had issues before. This is on > a cluster with ~94 gigs of ram, and in the past I've exceeded this limit > and it has usually thrown an error (one of our sysadmin's scripts), not > silently hung. &g

Re: [Scikit-learn-general] Anyone experience hanging when parallelizing fits?

2014-05-27 Thread Kyle Kastner
ution :) On Tue, May 27, 2014 at 3:48 PM, Kyle Kastner wrote: > What is your overall memory usage like when this happens? Sounds like > classic memory swapping/thrashing to me - what are your system specs? > > One quick thing to try might be to change the dtype of the matrices to > save

Re: [Scikit-learn-general] Anyone experience hanging when parallelizing fits?

2014-05-27 Thread Kyle Kastner
What is your overall memory usage like when this happens? Sounds like classic memory swapping/thrashing to me - what are your system specs? One quick thing to try might be to change the dtype of the matrices to save some space. float32 vs float64 can make a large memory difference if you don't nee

Re: [Scikit-learn-general] sparse_encode omp error -- "the number of atoms..." (line 456 of dict_learning.py), sklearn 0.14.1

2014-05-22 Thread Kyle Kastner
This looks like it would fix the issue with autochosen n_nonzero_coefs - which is great! After reading the paper mentioned in the docstring, I can see where the Gram matrix calculation is coming from now, but I think the check if tol is None and n_nonzero_coefs > len(Gram): raise ValueErr

Re: [Scikit-learn-general] sparse_encode omp error -- "the number of atoms..." (line 456 of dict_learning.py), sklearn 0.14.1

2014-05-15 Thread Kyle Kastner
additionally, why is 'gram = np.dot(dictionary, dictionary.T)' used for the OMP? According to the docstring on orthogonal_mp_gram it should be X.T * X . This would also make gram (n_features, n_features) and make the size checks work... On Thu, May 15, 2014 at 10:01 AM, Kyle Kas

Re: [Scikit-learn-general] sparse_encode omp error -- "the number of atoms..." (line 456 of dict_learning.py), sklearn 0.14.1

2014-05-15 Thread Kyle Kastner
this particular case. I hit this problem when I started testing on a different dataset - got the initial implementation on images, and am now trying to use it for signals. Thanks for looking into this - hopefully it is just confusion on my part On Thu, May 15, 2014 at 1:55 AM, Gael Varoquaux < gael.v

[Scikit-learn-general] sparse_encode omp error -- "the number of atoms..." (line 456 of dict_learning.py), sklearn 0.14.1

2014-05-14 Thread Kyle Kastner
I am having some issues with sparse_encode, and am not sure if it is a bug or my errror. In implementing a KSVDCoder, I have gotten something which appears to work on one dataset. However, when I swap to a different dataset, I begin to get errors, specifically: ValueError: The number of atoms cann

Re: [Scikit-learn-general] How does scikit handle HMM emission probabilities that do not add up to 1?

2014-05-08 Thread Kyle Kastner
0, 2, 0, 0, 1, 1, 1, 1], >> dtype=int64), >> array([0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0])) >> >> In this example, singal 0 should've never shown up (since its >> probability is 0 in both states). However, it is the only signal emitted by >>

Re: [Scikit-learn-general] How does scikit handle HMM emission probabilities that do not add up to 1?

2014-05-07 Thread Kyle Kastner
If you are manually specifying the emission probabilities I don't think there are any hooks/asserts to guarantee that variable is normalized I.E. if you assign to the emissionprob_ instead of using the fit() function, I think it is on you to make sure the emission probabilities you are assigning *a

Re: [Scikit-learn-general] Neural networks for regression?

2014-04-07 Thread Kyle Kastner
work was ever completed. Maybe someone else has a better knowledge of this. On Mon, Apr 7, 2014 at 2:39 PM, Kyle Kastner wrote: > You can also use the python interface to pylearn2, rather than the yaml. > If you are interested in examples of the python interface for pylearn2, I > have

Re: [Scikit-learn-general] Neural networks for regression?

2014-04-07 Thread Kyle Kastner
You can also use the python interface to pylearn2, rather than the yaml. If you are interested in examples of the python interface for pylearn2, I have some examples (I greatly prefer the python interface, but to each their own): https://github.com/kastnerkyle/pylearn2-practice/blob/master/cifar10

Re: [Scikit-learn-general] SVM Regression Fitting

2014-03-27 Thread Kyle Kastner
something like RMSE? On Thu, Mar 27, 2014 at 9:52 AM, Kyle Kastner wrote: > This may be an obvious question - but did you try applying a simple > Hamming, Blackman-Harris, etc. window to the data? Before trying EMD? > > Pretty much every transform (FFT included) has edge effect pro

Re: [Scikit-learn-general] SVM Regression Fitting

2014-03-27 Thread Kyle Kastner
This may be an obvious question - but did you try applying a simple Hamming, Blackman-Harris, etc. window to the data? Before trying EMD? Pretty much every transform (FFT included) has edge effect problems if the signal is not exactly at a periodic boundary, and it sounds like the SVR prediction w

  1   2   >