Re: [Scikit-learn-general] sklearn Hackathon during ICML ?

2016-04-12 Thread Vlad Niculae
I would definitely join the sprint, anything after June 17 works for me. I was thinking to come hang around during ICML, even if I might not be able to afford the conference. Cheers, Vlad On Tue, Apr 12, 2016 at 11:39 AM, Andreas Mueller wrote: > So should we pick another or possibly an addition

Re: [Scikit-learn-general] Latent Dirichlet Allocation

2016-02-09 Thread Vlad Niculae
I usually use an absolute threshold for min_df and a relative one for max_df. I find it very useful to look at the histogram of word dfs for choosing the latter, it varies a lot from dataset to dataset. For short texts, like tweets, words such as "the" can have a df of 0.1. It's very easy to look

Re: [Scikit-learn-general] Parameter estimation by Customised Cross Validation

2016-02-05 Thread Vlad Niculae
Hi Mamun, If your cluster labels are known, you can use the LabelShuffleSplit ore LeavePLabelOut cross-validation generators. HTH, Vlad On Fri, Feb 5, 2016 at 10:05 AM, Mamun Rashid wrote: > Hi Folks, > I have a two class classification problem where the positive labels reside in > clusters. >

Re: [Scikit-learn-general] maximum and minimum regularization for NMF

2016-02-02 Thread Vlad Niculae
Hi James, I'm not sure how useful a minimum alpha would be. Even if no weights are shrunk quite to zero, the regularization can still impact performance metrics. I would be curious what application you have in mind for this. The max alpha question is interesting, I am curious as well. (Sorry my

Re: [Scikit-learn-general] Analyzer and tokenizer in (Count/TfIdf)Vectorizer

2015-12-07 Thread Vlad Niculae
In the case of "char_wb" it sounds indeed like a custom tokenizer should be called if given. That would require a different implementation than the current one, however. You might want to file an issue. Sebastian's suggestion works, but note that scikit-learn's default tokenization is not the same

Re: [Scikit-learn-general] passing parameters to a transformer

2015-04-29 Thread Vlad Niculae
Is there a reason why you are (still) not respecting the API constraints for custom estimators given in the documentation? __init__ should only set parameters on self that have (exactly) the same name as the arguments passed to it. Your __init__ should be: self.k = k self.nestim

Re: [Scikit-learn-general] Topic extraction

2015-04-29 Thread Vlad Niculae
Another thing I've seen people do is to threshold based on the difference between the scores of the best and second best topics. (Only take documents with a clear winning topic.) For estimating the number of topics, you can use cross-validation. Vlad On Wed, Apr 29, 2015 at 12:42 AM, Joel Nothman

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
-optimization Vlad > On 20 Apr 2015, at 15:34, Vlad Niculae wrote: > > The example you cite contains these lines: > > "max_features": sp_randint(1, 11), > "min_samples_split": sp_randint(1, 11), > "min_samples_leaf

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
lly really really use > continuous distributions! > > On 04/20/2015 12:58 PM, Pagliari, Roberto wrote: >> Hi Vlad, >> when using randomized grid search, does sklearn look into intermediate >> values, or does it samples from the values provided in the parameter grid? >> &g

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
Hi Roberto > what does None do for max_depth? Copy-pasted from http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html "If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.” > In particula

Re: [Scikit-learn-general] GSoC 2015: Global optimization based Hyper parameter optimization (SMAC)

2015-03-31 Thread Vlad Niculae
> > In order to support discrete parameters, our tree implementation would need > to support categorical variables though. > Ah, good point, I didn’t think about that. But we could use the usual hacks (integer or one-hot encoding). I wonder how that compares to using GPs and rounding when it c

Re: [Scikit-learn-general] GSoC 2015: Global optimization based Hyper parameter optimization (SMAC)

2015-03-31 Thread Vlad Niculae
Hi Gael, > On 31 Mar 2015, at 14:01, Gael Varoquaux > wrote: > >> Why do you think the GP route is easier? > > Because we already have GPs. Well, we already have random forests too. Both cases would need quite a bit of machinery on top, and I don’t know the extent of it, but I thought it wo

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Vlad Niculae
Hi Artem, hi everybody, There were two API issues and I think both need thought. The first is the matrix-like Y which at the moment overlaps semantically with multilabel and multioutput-multiclass (though I think it could be seen as a form of multi-target regression…) The second is the `estima

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Vlad Niculae
Hi Cristoph, Gael, hi everyone, > On 24 Mar 2015, at 18:09, Gael Varoquaux > wrote: > >> Don't you think that I could also benchmark models that are not >> implemented in sklearn? […] > > I am personally less interested in that. We have already a lot in > scikit-learn and more than enough to

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-24 Thread Vlad Niculae
Hi Wei Xue, hi everyone, I think Andy’s comments about testing and documentation are very important. I have just a few things to add: 1. As confused as I am about the world around me, I still knew that the current year is 2015 :P I think that the form is asking “which year of your program you

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-24 Thread Vlad Niculae
Hi Boyuan, hi everyone, On top of what Andy said, I would like to add that you don’t have to commit to certain algorithms in the proposal, as long as you make the plan very clear, and you leave time for discussing alternatives, pros and cons with the community. Since you say there is some ove

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-24 Thread Vlad Niculae
Hi Raghav, hi everyone, If I may, I have a very high-level comment on your proposal. It clearly shows that you are very involved in the project and understand the internals well. However, I feel like it’s written from a way too technical perspective. Your proposal contains implementation detai

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-23 Thread Vlad Niculae
de some work upon that, but I didn't > get any feedback. > > On Tue, Mar 24, 2015 at 3:23 AM, Vlad Niculae wrote: > Hi Vinayak, > > The wiki page just lists a subset of possible topics for which candidates > already showed concrete interest. I think an application

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-23 Thread Vlad Niculae
Hi Vinayak, The wiki page just lists a subset of possible topics for which candidates already showed concrete interest. I think an application for low-rank matrix completion would be more than welcome. It’s very important to work on a topic that you are interested in directly, versus just pick

Re: [Scikit-learn-general] Regarding viewing the decision boundaries of classifiers

2015-02-21 Thread Vlad Niculae
Apologies in advance, but this fits so well, I couldn’t help myself. A Mathematician and an Engineer attend a lecture by a Physicist. The topic concerns Kulza-Klein theories involving physical processes that occur in spaces with dimensions of 9, 12 and even higher. The Mathematician is sitting,

Re: [Scikit-learn-general] same cross validation score with different parameter configurations

2015-02-18 Thread Vlad Niculae
Hi Roberto, This is explained in the Python standard library documentation: https://docs.python.org/3/library/functions.html#sorted Cheers, Vlad > On 18 Feb 2015, at 21:33, Pagliari, Roberto wrote: > > what does sorted do if the best average cv score is the same? > > how are they sorted? >

Re: [Scikit-learn-general] custom regressor keeps failing

2015-02-16 Thread Vlad Niculae
Hi Roberto, Everything I say below is also explained in the developers documentation that I linked to in the other e-mail. [1] You are breaking some conventions that make the default `get_params` and `set_params` not work well. As I said in the other thread, fitted attributes are suffixed with

Re: [Scikit-learn-general] which methods do I need to implement for a regressor?

2015-02-16 Thread Vlad Niculae
Hi Roberto, This is all documented in more detail here: [1] The transform looks good (just that you might want to add a flag to avoid memory copies when you can afford to destroy the original data). It’s not clear what the intention of `my_param` is here. It’s not user specified, right? Conven

Re: [Scikit-learn-general] Feature selection and cross validation; and identifying chosen features

2015-02-11 Thread Vlad Niculae
> On 11 Feb 2015, at 16:31, Andy wrote: > > > On 02/11/2015 04:22 PM, Timothy Vivian-Griffiths wrote: >> Hi Gilles, >> >> Thank you so much for clearing this up for me. So, am I right in thinking >> that the feature selection is carried for every CV-fold, and then once the >> best parameters

Re: [Scikit-learn-general] Data reconstruction after SparsePCA

2014-10-17 Thread Vlad Niculae
To clarify, it is *not* the case that `x.dot(spca.components_.T) ` is equivalent to `spca.transform(x)`. The latter performs a solve. Best, Vlad On Fri, Oct 17, 2014 at 12:03 PM, Vlad Niculae wrote: > Hi Luca > >> x_3_dimensional = x.dot(spca.components_.T) # this is e

Re: [Scikit-learn-general] Data reconstruction after SparsePCA

2014-10-17 Thread Vlad Niculae
Hi Luca > x_3_dimensional = x.dot(spca.components_.T) # this is equivalent to > spca.transform(x) This part is specific to PCA. In general, the transform part of such a decomposition is `X * components ^ -1`. In PCA, because `components` is orthogonal, `components ^ -1` is `components.T`. The r

Re: [Scikit-learn-general] Data reconstruction after SparsePCA

2014-10-16 Thread Vlad Niculae
Hi Luca, The other part of the decomposition that you're missing is available in `spca.components_` and has shape `(n_components, n_features)`. The approximation of X is therefore `np.dot(x_3_dimensional, spca.components_)`. Best, Vlad On Thu, Oct 16, 2014 at 6:07 PM, Luca Puggini wrote: > Hi,

Re: [Scikit-learn-general] Inputer, python list and strings

2014-09-25 Thread Vlad Niculae
Hi Zoraida, The Imputer assumes that your data is a numeric numpy array, or convertible to one. You should replace your string "NA" values with np.nan objects, then use the Imputer with the default, `missing_values='NaN'`. It's easier to debug if you explicitly convert your data to a float numpy

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Vlad Niculae
Hi Pavel, First of all, this is an interesting subject, thanks for bringing it up! I fear that it's too domain-specific to go very deep in this direction. That being said, and trying to interpret your benchmarks, it seems that Delta-idf might actually be interesting. Or, more generally, the idea o

Re: [Scikit-learn-general] Custom Scoring Functions for Grid Search

2014-08-20 Thread Vlad Niculae
It has confused me as well, +1. It's counterintuitive and broken, in my opinion. Vlad On Wed, Aug 20, 2014 at 2:31 PM, Gael Varoquaux wrote: >> It's been around for so long, but it's also hard to believe that anyone >> exploited this behaviour intentionally. Shall we be bold and just fix >> it

Re: [Scikit-learn-general] VarianceThreshold

2014-08-17 Thread Vlad Niculae
Also, the class is well documented, but because of an omission, it wasn't linked from the API page at the time of the last stable release. This has been fixed in the development version, so you can read the docs in a friendlier way here [1]. Best, Vlad [1] http://scikit-learn.org/dev/modules/ge

Re: [Scikit-learn-general] How to implement cross_val_score scoring function with a weights array?

2014-08-03 Thread Vlad Niculae
Hi, If you want to get `sample_weights` working with the current master, the easiest is to take PR 3524 and either pass it through `fit_params` or just undo the last commit in the branch. I needed to change a couple of things to get 1574 up to date with the current master, but nothing else is cha

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Vlad Niculae
But SGDClassifier optimizes classification-specific loss functions, unlike ElasticNet which is a regressor. Correct me if i'm wrong, but wrapping ElasticNet in a OvR fashion doesn't lead to the same thing, and SGDClassifier would generally be more appropriate for classification in my opinion. My 2

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-22 Thread Vlad Niculae
Hi, The SGDClassifier supports elastic net regularization. You can make it solve the SVM loss function or the logistic loss function by changing the `loss=` parameter. Hope this helps, Vlad On Tue, Jul 22, 2014 at 4:17 PM, Sheila the angel wrote: > Hello All, > > Is it possible to perform class

Re: [Scikit-learn-general] Sparse NMF

2014-06-25 Thread Vlad Niculae
will never post a docstring again :) > sorry for the noise > > michael > > > On Wednesday, June 25, 2014, Vlad Niculae wrote: >> >> Hi, >> >> Allow me to clarify. We don't implement Hoyer's sparse update rule >> indeed (it shouldn&#x

Re: [Scikit-learn-general] Sparse NMF

2014-06-25 Thread Vlad Niculae
Hi, Allow me to clarify. We don't implement Hoyer's sparse update rule indeed (it shouldn't say "this implements", I initially cited Hoyer for motivating sparseness constraints in NMF). Instead, we implement a version of sparse NMF with a clear (but not particularly elegant) objective function, fo

Re: [Scikit-learn-general] About weekly posts for GSoc 2014

2014-06-01 Thread Vlad Niculae
IIRC, weekly post are not a GSoC requirement but they are a _PSF_ requirement, and since scikit-learn is participating to GSoC under the PSF umbrella, the requirement applies to us. I think it's great incentive to think of your work in terms of what you could show to others. No matter how lit

Re: [Scikit-learn-general] My talk was approved for EuroScipy'14

2014-05-21 Thread Vlad Niculae
This is great news, congratulations Gilles! Cheers, Vlad On May 22, 2014 8:15 AM, "Gilles Louppe" wrote: > Hi folks, > > Just for letting you know, my talk "Accelerating Random Forests in > Scikit-Learn" was approved for EuroScipy'14. Details can be found at > https://www.euroscipy.org/2014/sche

Re: [Scikit-learn-general] GSoC acceptance - Sparse Support

2014-04-23 Thread Vlad Niculae
Congrats, I am also looking forward to a productive summer :) Vlad On Wed Apr 23 11:30:45 2014, Arnaud Joly wrote: > Congratulation Hamzeh !!! > > I am looking forward working with you ! > > Arnaud > > On 23 Apr 2014, at 03:57, Hamzeh Alsalhi > wrote: > >> Thank you to

Re: [Scikit-learn-general] Belief propagation and message-passing methods

2014-03-19 Thread Vlad Niculae
Hi John, I believe general inference methods are out of scope for scikit-learn. Even general structured learning algorithms are not in scope at the moment, as it's hard to fit problems in numpy arrays. For learning, you might want to check out pystruct [1]. If you just want inference, opengm

Re: [Scikit-learn-general] Query in Sparse matrices: scipy.linalg.get_blas_funcs()

2014-03-18 Thread Vlad Niculae
Hi Manoj, For efficiency, the BLAS api defines different functions for different underlying datatypes (float32, float64, complex64, complex128). The scipy "get_blas_funcs" utility has the role of getting the Python wrapper for the given BLAS functions (in this case 'swap' and 'nrm2', that's ap

Re: [Scikit-learn-general] GSoC

2014-03-17 Thread Vlad Niculae
> This program is granted free of charge for research and education > purposes. However you must obtain a license from the author to use it > for commercial purposes. Definitely FEST is not BSD compatible :( Vlad On 17/3/2014 14:19 , Arnaud Joly wrote: > Hi, > > > The support for sparse matri

Re: [Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)

2014-03-07 Thread Vlad Niculae
In some cases it might be preferable to fit an OvA model. In those cases, I think the user code would look nicer and more explicit if it'd use the sklearn.multiclass.OneVsRest encoder. The downside is that we'll need to go through an ugly deprecation cycle for a major class in the library. With

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Vlad Niculae
On Wed Feb 26 13:32:08 2014, Gael Varoquaux wrote: > documentation and example This was exactly my thought. Many such (near-)equivalences are not obvious, especially for beginners. If Lars's hinge ELM and RBF network would work well (or provide interesting feature visualisations) on some sklea

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Vlad Niculae
If you're affiliated with a university, Anaconda has free academic licenses that include MKL and their optimized builds. Vlad On Mon Feb 24 09:22:07 2014, Javier Martínez-López wrote: > That is great, thanks! I do not have the mkl module (it isn't free, > right?) but with your script the calcula

Re: [Scikit-learn-general] Query with fit_intercept param

2014-02-15 Thread Vlad Niculae
On 15/2/2014 10:17 , Manoj Kumar wrote: > Thanks Vlad, > > Can you tell me how to take care of that? What exactly do you mean by "take care of that"? Like Mathieu said, it probably doesn't matter too much in terms of the final score. Vlad > > On Sat, Feb 15,

Re: [Scikit-learn-general] Query with fit_intercept param

2014-02-15 Thread Vlad Niculae
Hi Manoj, In the first example, the intercept is not regularized, hence the difference. Vlad On Feb 15, 2014 8:54 AM, "Manoj Kumar" wrote: > Hello > > I have a query with fit_intercept parameter in most of the estimators. > > When we have a linear model like w0 + w1*x1 + w2*x2 + .. I'm assuming

Re: [Scikit-learn-general] Scikit-Learn sprint 2014 - July in Paris

2014-02-13 Thread Vlad Niculae
Awesome, count me in! Looking forward to it. Thanks a lot for the Telecom ParisTech hosting. Vlad On Thu Feb 13 20:04:24 2014, Alexandre Gramfort wrote: > hi sklearners, > > we're planing to reproduce the success of last year's scikit-learn > sprint in Paris. > > We'll have a new sprint in Tele

Re: [Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Vlad Niculae
I've heard stchee-kit once, along with stchee-pee and num-pee. Vlad On Sun Feb 2 18:39:58 2014, Hadayat Seddiqi wrote: > i always said "skikit" > > > On Sun, Feb 2, 2014 at 12:20 PM, Andy > wrote: > > On 02/02/2014 12:06 PM, Olivier Grisel wrote: > > Note: the n

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-28 Thread Vlad Niculae
I like the locality-sensitive hashing idea! Vlad On Tue Jan 28 10:04:36 2014, Nick Pentreath wrote: > This would be a great addition. > > Some ideas /code perhaps: http://nearpy.io/ > > > On Tue, Jan 28, 2014 at 10:59 AM, Mathieu Blondel > mailto:math...@mblondel.org>> wrote: > > If we have a

Re: [Scikit-learn-general] Scikit-Learn for android

2014-01-19 Thread Vlad Niculae
I don't think Weka (at least the interesting parts of it) could run on Android either. I don't really foresee the whole Scipy stack running on Android; maybe one day when all dependencies are rewritten in PyPy and are faster and still 100% compatible... One thing that would be possible (but I d

Re: [Scikit-learn-general] A poster about scikit-learn at Giga-day

2014-01-17 Thread Vlad Niculae
Hi Arnaud, awesome poster! Here are a few things that popped out: Firstly, I doubt it matters, but some of the links are mangled. Then, I think it should say "students' master's theses" or something like this (plural). Also "the chromosome 15" sounds strange to me compared to "chromosome 15".

Re: [Scikit-learn-general] Suggestion to add author names/emails at the bottom of module documentations

2014-01-16 Thread Vlad Niculae
I would rather have this sorted out through the github issue tracker. I don't think it's a good idea to encourage users to e-mail individual developers. Someone else could have the expertise and do the change confidently. My 2c, Vlad On Thu Jan 16 18:12:05 2014, Issam wrote: > Hi scikit-learn

Re: [Scikit-learn-general] Estimator Freeze for 1.0?

2013-12-27 Thread Vlad Niculae
Specifically I have the biclustering in mind but I think there are more in this situation. Thanks to the effective [MRG] tag it should be just a matter of choosing a subset of this. Vlad On Fri, Dec 27, 2013 at 1:30 PM, Andy wrote: > On 12/27/2013 12:27 PM, Vlad Niculae wrote: >> I th

Re: [Scikit-learn-general] Estimator Freeze for 1.0?

2013-12-27 Thread Vlad Niculae
I think there might be quite a bit of PRs that are "almost there" but stalled right before the finish line. I think (some of) these should be chosen for inclusion in the estimator freeze, right? Vlad On Fri, Dec 27, 2013 at 1:22 PM, Andy wrote: > Hey everybody. > On NIPS I talked to Gael and Gil

Re: [Scikit-learn-general] Releasing joblib 0.8a

2013-12-20 Thread Vlad Niculae
turn off multiprocessing at prediction time - this > might backfire quite easily. > > > 2013/12/20 Olivier Grisel >> >> 2013/12/20 Vlad Niculae : >> > Works exactly as you described on my machine (which doesn't mean much >> > because it's relatively close to

Re: [Scikit-learn-general] Releasing joblib 0.8a

2013-12-20 Thread Vlad Niculae
Works exactly as you described on my machine (which doesn't mean much because it's relatively close to yours, but I am just too enthusiastic about this not to reply! \o/) Memory usage is as expected. I see a speedup in train time but a slight slowdown in test time (1.7 vs 1.0), is it expected or p

Re: [Scikit-learn-general] Updated KMeansCoder now available as gist

2013-12-13 Thread Vlad Niculae
anyone know if that is > currently implemented/in development for sklearn? I haven't looked for it in > sklearn yet, but it seems like a cool approach > > > On Fri, Dec 13, 2013 at 12:20 PM, Vlad Niculae wrote: >> >> Great, thanks a lot! >> >> I'm al

Re: [Scikit-learn-general] Updated KMeansCoder now available as gist

2013-12-13 Thread Vlad Niculae
Great, thanks a lot! I'm also curious about what you're running it on and about how the performance is. Vlad On Fri, Dec 13, 2013 at 7:11 PM, Olivier Grisel wrote: > Nice. > > Have you used it with success for real image classification tasks? > > I see you have been involved in the cats vs dogs

Re: [Scikit-learn-general] from sklearn.all import *

2013-12-02 Thread Vlad Niculae
Personally I'd rather be a bit frustrated but have tab completion and pyflakes warnings. I avoid using star imports even in hackish scripts. I assume the warning will create unnecessary confusion when people learn to use the star import first. These users will probably feel that the warning is a s

Re: [Scikit-learn-general] release time

2013-11-30 Thread Vlad Niculae
seqlearn uses a different API on purpose though (one big ndarray), whereas pystruct uses lists of arrays but is only focused on max-margin learning :) On Sat, Nov 30, 2013 at 12:38 PM, Gael Varoquaux wrote: > +1 on the whole thread. > > I was hoping that Lars's seqlearn could be a home for poor H

Re: [Scikit-learn-general] release time

2013-11-30 Thread Vlad Niculae
> I guess remove means deprecate, right? > I am +1 but we should definitely find a place for the code. Worse case > it will be a repo with containing just the HMM. My thoughts exactly; my impression is that people do find the code useful and it's reasonably readable. It should definitely go into

[Scikit-learn-general] Fwd: Problem with scikit learn kernel PCA

2013-11-25 Thread Vlad Niculae
ll be the same one. But I'm not the best person to ask, I've never even used the Kernel PCA. Cheers, Vlad -- Forwarded message ------ From: Vlad Niculae Date: Mon, Nov 25, 2013 at 10:41 PM Subject: Fwd: Problem with scikit learn kernel PCA To: Vlad Niculae On Mon, Nov 25,

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-19 Thread Vlad Niculae
I finally found a desk and some focus. I addressed Mathieu's suggestions and added some timings on real data (with a lot of concessions so that it would run reasonably quick on my machine). Here's the results: http://nbviewer.ipython.org/7224672 It becomes clear that `tol` still means different t

Re: [Scikit-learn-general] Automated benchmarking

2013-11-08 Thread Vlad Niculae
Vlad, that's exactly what I've been looking for! > > Thanks, > Karol > > > 2013/11/8 Vlad Niculae >> >> We have an instance of vbench continuously running [1] that I did as a >> GSoC project last year. >> >> For some reason it seems that the li

Re: [Scikit-learn-general] Automated benchmarking

2013-11-08 Thread Vlad Niculae
We have an instance of vbench continuously running [1] that I did as a GSoC project last year. For some reason it seems that the links don't generate properly now, but it still works (though all data got lost in a jenkins setup incident this summer). Here are some linear model benchmarks for exam

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Vlad Niculae
Re: the discussion we had at PyCon.fr, I noticed that the internal elastic net coordinate descent functions are parametrized with `l1_reg` and `l2_reg`, but the exposed classes and functions have `alpha` and `l1_ratio`. Only yesterday there was somebody on IRC who couldn't match Ridge with Elastic

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
I feel like this would go against "explicit is better than implicit", but without it grid search would indeed be awkward. Maybe: if self.alpha_coef == 'same': alpha_coef = self.alpha_comp ? On Thu, Nov 7, 2013 at 4:19 PM, Mathieu Blondel wrote: > > On Thu, Nov 7, 2013 at 11:57 PM, Lars Bui

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
> This is a known problem with np.linalg.norm, and so is the memory > consumption. You should use sklearn.utils.extmath.norm for the > Frobenius norm. Hmm. Indeed I missed that, but still, this is a bit odd. sklearn.utils.extmath.norm is slower than raveling on my anaconda with MKL accelerate setu

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
55 0.936736 262 0.940665 2690.9343853 2760.9337552 2830.9300757 2900.9297058 2970.9262745 3040.9274619 3110.9275654 Name: residual, dtype: object It looks spot on. Note that tolerance is 1e-3. Any idea how to make it visible in the plot when two lines are so close? On

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
eIncrement Line Contents 4 35.7 MiB 0.0 MiB def linalg(X): 5 42.7 MiB 7.0 MiB return np.linalg.norm(X, 'fro') On Thu, Nov 7, 2013 at 11:46 AM, Vlad Niculae wrote: > Come to think of it, Olivier, what do you mean when you say L-B

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
ecific? On Thu, Nov 7, 2013 at 11:12 AM, Vlad Niculae wrote: > The regularization is the same, I think the higher residuals come from > the fact that the gradient is raveled, so compared to `n_targets` > independent problems, it will take different steps. > > I don't think

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
at 9:51 AM, Olivier Grisel wrote: > 2013/11/7 Vlad Niculae : >> Hi everybody, >> >> I just updated the gist quite a lot, please take a look: >> http://nbviewer.ipython.org/7224672 >> >> I'll go to sleep and interpret it with a fresh eye tomorrow, but >>

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-06 Thread Vlad Niculae
option in multitask lasso (as well as the sparse variant). Is there any other reason for this or just that nobody needed it? Cheers, Vlad On Wed, Oct 30, 2013 at 10:40 AM, Vlad Niculae wrote: > Thanks Mathieu, well part of it comes from your gist (I added an > attribution now) ;) > &

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Vlad Niculae
Thanks Mathieu, well part of it comes from your gist (I added an attribution now) ;) Non-negative lasso is really interesting, I forgot about it but I think it would be very interesting to compare qualitatively. Vlad On Wed, Oct 30, 2013 at 10:15 AM, Olivier Grisel wrote: > 2013/10/30 Mathieu

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Vlad Niculae
i guess it's just a bug in how the solvers return residuals, I'll add some unit tests with manually-computed residuals to check. On Wed, Oct 30, 2013 at 9:48 AM, Olivier Grisel wrote: > Does anyone have a explanation for the discrepancy in the residuals > for the lbfgs-b and nnls_kkt? If nnls_kkt

[Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-29 Thread Vlad Niculae
Hi all, During the PyCon sprint I kept digging into the NMF and specifically ways to solve each sub-iteration. It became clear that the alternating NLS approach finds good reconstructions and converges well, but the NLS solving step is critical and must be optimized. I have started looking into d

Re: [Scikit-learn-general] Multi Class Classification

2013-10-20 Thread Vlad Niculae
Hi, We refer to such a setting as *multi-label*. Please take a look at http://scikit-learn.org/stable/modules/multiclass.html Yours, Vlad On Sun, Oct 20, 2013 at 1:19 PM, Mahendra Kariya wrote: > Hi, > > I am trying to do multi class classification using NB or linear SVM. In the > training dat

Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Vlad Niculae
>> There are still a few things that are not clear to me from the >> documentation. Can you customize the classifier to perform a different >> decision function? > > You can subclass it and override the decision_function method. While true, this can be misleading. You're just changing the final st

Re: [Scikit-learn-general] Error when using an array for one feature linear regression

2013-09-24 Thread Vlad Niculae
Just to add, I don't think you need to reshape y. And reshaping x can be more briefly stated as x[:, np.newaxis]. In my opinion supporting such cases, while convenient for users, would lead to annyoing branches and code that is harder to maintain and test. The important thing is being consistent.

Re: [Scikit-learn-general] Does scikit RBM support continuous values?

2013-09-17 Thread Vlad Niculae
And under the current implementation, implementing them involves changing only the sampling and energy computation, I think. I discussed this with Gabriel Synnaeve during the sprint and I think he was working on the gaussian version, it might be on his repo. Lars, do you have any practical experi

Re: [Scikit-learn-general] Shining Panda emails

2013-09-10 Thread Vlad Niculae
Also, the builds fail quite rarely (with the exception of the last few weeks). And when they do, I think these e-mails make sure that it gets fixed faster than without them. It's better not to unsubscribe. Even if it's annoying if it's *definitely* not your fault (documentation PRs) sometimes you m

Re: [Scikit-learn-general] Testing small code peices

2013-08-29 Thread Vlad Niculae
If you're writing an external "script" that just interfaces with scikit-learn and you intend to keep it separately distributable (3rd party), you can replace them with absolute imports: ``` from sklearn.base import ClassifierMixin, RegressorMixin from sklearn.externals.joblib import Parallel, dela

Re: [Scikit-learn-general] Files at sourceforge

2013-08-29 Thread Vlad Niculae
It's about redirecting /dev and /stable to the appropriate fixed paths. Actually I remember that this has been looked into, I vaguely remember a thread a while back. I think the problem is that we couldn't move to github while keeping all the old links and looking the same in the eyes of the goog

Re: [Scikit-learn-general] Overflow when vectorizing large corpus

2013-08-29 Thread Vlad Niculae
3 at 12:11 PM, Olivier Grisel wrote: > 2013/8/28 Lars Buitinck : > > 2013/8/28 Vlad Niculae : > >> Do the indices/indptr arrays need to be int32 or is this a limitation > of the > >> implementation? > > > > This is a limit in scipy.sparse, which uses signed

Re: [Scikit-learn-general] Overflow when vectorizing large corpus

2013-08-28 Thread Vlad Niculae
After doing it again with pdb I figured out that it has nothing to do with vocabulary size, which is decent; the list of indices simply grows too big. Vlad On Wed, Aug 28, 2013 at 11:01 PM, Vlad Niculae wrote: > Hi all, > > I got an unexpected error with current master, when tryi

[Scikit-learn-general] Overflow when vectorizing large corpus

2013-08-28 Thread Vlad Niculae
Hi all, I got an unexpected error with current master, when trying to run TfidfVectorizer on a 2 billion token corpus. /home/vniculae/envs/sklearn/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in _count_vocab(self, raw_documents, fixed_vocab) 728 #

Re: [Scikit-learn-general] Nonn-ASCII in source files

2013-08-28 Thread Vlad Niculae
I'll have to side slightly against Lars on this one. I agree with Lars that any software that doesn't support these is broken, that Unicode looks better than other ad-hoc formatting. If the software works, often the fonts won't. Personally if I'd need to see the source and find characters missing

Re: [Scikit-learn-general] Segfault with large dataset

2013-08-24 Thread Vlad Niculae
Is it maybe related to the OS, as it seems that the problem is with opening the memmapped file? Vlad On Sat, Aug 24, 2013 at 1:52 PM, Olivier Grisel wrote: > Sounds like a serious bug, could you please open an issue on github? > > -- > Olivier > > > -

Re: [Scikit-learn-general] starting sklearn

2013-08-24 Thread Vlad Niculae
hing? > > And thanks for the tips on nosetests. I have made some progress using > scikits and learning python but I never got that to work. > > Thanks again, > Don > > On Aug 24, 2013, at 6:16 PM, Vlad Niculae wrote: > > The `python` and `nosetests` executables that

Re: [Scikit-learn-general] starting sklearn

2013-08-24 Thread Vlad Niculae
The `python` and `nosetests` executables that you are running are probably not the macports ones. Type `which python` and `which nosetests`; usually the macports one should be in /opt/local/bin. Try running them manually. Also, any reason not to use py27? Yours, Vlad On Mon, Aug 12, 2013 at 7:

Re: [Scikit-learn-general] Fedora and RHEL packaging

2013-08-22 Thread Vlad Niculae
Like Olivier said, libsvm and liblinear are heavily patched and scikit-learn wouldn't work with the upstream versions. If bundling them is unacceptable, I guess maybe packaging our forks individually as libsvm-sklearn or something similar would be a solution, but I think it would be confusing. Al

Re: [Scikit-learn-general] PyStruct 0.1 released

2013-08-11 Thread Vlad Niculae
Congratulations Andy! Thanks for all your hard work on this. This is a good moment for pystruct to gain some momentum! Cheers, Vlad On Sun, Aug 11, 2013 at 8:55 PM, Andreas Mueller wrote: > Hey everybody. > > I just wanted to spam the ML again and say I just "released" PyStruct 0.1. > It contai

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
gt; On Mon, Jul 29, 2013 at 1:58 PM, Gael Varoquaux > wrote: >> >> On Mon, Jul 29, 2013 at 01:54:21PM +0200, Vlad Niculae wrote: >> > I uploaded the windows binaries manually through the web interface >> > with no issue. >> >> I might give up and upload

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
I uploaded the windows binaries manually through the web interface with no issue. Unrelated question: We could go for a python3.3 binary too, but I would need to build it using the (free) scipy installed with Anaconda, because official scipy doesn't provide binaries for python 3.3. From what I un

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
Or simply hide the 0.14a1 release? It should still stay pip installable if you use the right magic words, right? On Mon, Jul 29, 2013 at 1:35 PM, Andreas Mueller wrote: > On 07/29/2013 01:20 PM, Andreas Mueller wrote: > > On 07/29/2013 01:13 PM, Olivier Grisel wrote: > > Maybe try: > > python set

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
I can do it; the question is whether to build against anaconda or against binary numpy/scipy; and whether it matters. I'll see if I can check. On Mon, Jul 29, 2013 at 12:09 PM, Olivier Grisel wrote: > 2013/7/29 Olivier Grisel : >> I found problems when running the tests on an installed version o

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
Sorry, but I can't find the issue, you posted the same link twice. Those errors are very similar to what I was getting before figuring out that I need to use nosetests3 instead of nosetests. Vlad On Mon, Jul 29, 2013 at 10:35 AM, Olivier Grisel wrote: > I found problems when running the tests o

Re: [Scikit-learn-general] 20 newsgroups classification

2013-07-26 Thread Vlad Niculae
Hi Harold, Only the current development version, and the upcoming release, has, as of recently, support for Python 3. Even so, it won't be easy to support 3.2, we just aim for 3.3 at the moment. This being said, I have no idea what causes this specific error. That line seems unchanged in the curr

Re: [Scikit-learn-general] Name of a hierarchical agglomerative clustering object

2013-07-23 Thread Vlad Niculae
Easy to mistype but as appropriate as it gets. +1 On Tue, Jul 23, 2013 at 10:49 AM, Olivier Grisel wrote: > 2013/7/23 Lars Buitinck : >> 2013/7/23 Olivier Grisel : >>> 2013/7/23 Gael Varoquaux : Hi people How would you like an object that implements different hierarchical aggl

  1   2   3   >