[Scikit-learn-general] The right way to reduce dimension in Pipeline tied with estimator.

2014-06-20 Thread Gundala Viswanath
Dear expert, I'm trying to do dimensionality reduction using 'pipeline' based on the following: >>> from sklearn.pipeline import Pipeline >>> from sklearn.svm import SVC >>> from sklearn.decomposition import PCA >>> estimators = [('reduce_dim', PCA()), ('svm', SVC())] >>> clf = Pipeline(estimator

Re: [Scikit-learn-general] Instance Reduction on scikit-learn

2014-06-20 Thread Dayvid Victor
Thanks Joel and Mathieu. I'll start a new project and ask to add a reference in the Wiki. (hopefully, this week); I don't think all the techniques should be included, but at least the most popular (ENN, RENN, OSS) / and efficient ones (SSMA, PSO). mblondel, you think 'fit_transform' would be bett

Re: [Scikit-learn-general] Instance Reduction on scikit-learn

2014-06-20 Thread Mathieu Blondel
+1 to starting a separate project in order to receive early feedback. Besides popularity and number of citations, an issue is that our API doesn't currently support instance reduction. We need to decide whether to introduce a new method (e.g., "reduce" as you did) or use fit_transform (so far fit_

Re: [Scikit-learn-general] Scikit-Learn sprint 2014 - July in Paris

2014-06-20 Thread Andreas Mueller
As if I'd miss a sprint ;) On Jun 21, 2014 12:09 AM, "Gael Varoquaux" wrote: > Yes I believe that our spare bedroom is available at this time. > > Worst case, we should also have funding to house you. > > Gaël > > PS: awesome that you are coming! !@ > > > Original message > Fro

Re: [Scikit-learn-general] Scikit-Learn sprint 2014 - July in Paris

2014-06-20 Thread Gael Varoquaux
Yes I believe that our spare bedroom is available at this time.  Worst case,  we should also have funding to house you.  Gaël PS: awesome that you are coming! !@ Original message From: Andy Date:20/06/2014 21:46 (GMT+01:00) To: scikit-learn-general@lists.sourceforge.net Su

Re: [Scikit-learn-general] Scikit-Learn sprint 2014 - July in Paris

2014-06-20 Thread Andreas Mueller
OK thanks everyone, I got something :) On Jun 20, 2014 9:46 PM, "Andy" wrote: > Hey Everyone. > > Does anyone by any chance have a spare bed / couch? > > Cheers, > Andy > > On 06/08/2014 01:47 PM, Alexandre Gramfort wrote: > >> hi everyone, >> >> time to reactivate this thread... >> >> time is ru

Re: [Scikit-learn-general] Scikit-Learn sprint 2014 - July in Paris

2014-06-20 Thread Kyle Kastner
Sent you an email - I know of at least one possibility. On Fri, Jun 20, 2014 at 2:46 PM, Andy wrote: > Hey Everyone. > > Does anyone by any chance have a spare bed / couch? > > Cheers, > Andy > > On 06/08/2014 01:47 PM, Alexandre Gramfort wrote: > > hi everyone, > > > > time to reactivate this

Re: [Scikit-learn-general] Scikit-Learn sprint 2014 - July in Paris

2014-06-20 Thread Andy
Hey Everyone. Does anyone by any chance have a spare bed / couch? Cheers, Andy On 06/08/2014 01:47 PM, Alexandre Gramfort wrote: > hi everyone, > > time to reactivate this thread... > > time is running fast and we should start planning the details for the sprint. > > If you can/want to come plea

Re: [Scikit-learn-general] Instance Reduction on scikit-learn

2014-06-20 Thread Joel Nothman
Hi Dayvid, For now, a number of projects that follow the scikit-learn interface but for one reason or another (often just out of scope) are listed at https://github.com/scikit-learn/scikit-learn/wiki/Third-party-projects-and-code-snippets . I would recommend against keeping everything in a scikit

Re: [Scikit-learn-general] Strings as features

2014-06-20 Thread Brian Wingenroth
Hi Abijith, This should get you started: http://scikit-learn.org/dev/tutorial/text_analytics/working_with_text_data.html Brian On 6/20/14, 12:05 PM, Abijith Kp wrote: > Can anyone help me with the problem of dealing with feature which are > both strings of varying length(say from 0 to 100-150 c

Re: [Scikit-learn-general] Instance Reduction on scikit-learn

2014-06-20 Thread Dayvid Victor
Hi Joel, Thanks for your feedback. Let me see if I got this straight, you think I should open a new repository and then add an entry in the Wiki? Do you have an example of some other project that did the same? How do I organize it, do I start a new project or I build a new project inside my sklea

[Scikit-learn-general] Strings as features

2014-06-20 Thread Abijith Kp
Can anyone help me with the problem of dealing with feature which are both strings of varying length(say from 0 to 100-150 characters) and numbers? What will be the most widely used techniques in such kind of situations? And can it be solved using only scikit-learn? PS: Initially I have to conver

Re: [Scikit-learn-general] Implementation of new Anomaly/Outlier Detection Algorithms

2014-06-20 Thread Alexandre Gramfort
hi, Nicolas, could you give some numbers on the impact of these different works to get an idea of which work might have the highest interest for the sklearn community? do they all scale to medium or large datasets? is there anybody on the list with experience with these tools? Best, Alex On Fr