Re: [scikit-learn] Help With Text Classification

2017-08-02 Thread Joel Nothman
One of the key advantages of Pipeline is that it makes sure that equivalent processing happens at training and prediction time (assuming you do not write your own transformers that break their contract). This is what appears to have broken in your current attempts. On 3 August 2017 at 13:12, pybok

Re: [scikit-learn] Nearest neighbor search with 2 distance measures

2017-08-02 Thread Rohin Kumar
Dear Jake, Thank you for your inputs. Had a look at cykdtree. Core implementation of the algorithm is in C/C++ modifying which is currently beyond my skill. Will try to contact their team if they entertain special requests. I should be able fork and modify the sklearn algorithm in cython once my c

Re: [scikit-learn] Help With Text Classification

2017-08-02 Thread pybokeh
Thanks Joel for recommending FeatureUnion. I did run across that. But for just 2 features, I thought that might be overkill. I am aware of Pipeline which the scikit-learn example explains very well, which I was going to utilize once I finalize my script. I did not want to abstract away too much

Re: [scikit-learn] Help With Text Classification

2017-08-02 Thread Joel Nothman
Use a Pipeline to help avoid this kind of issue (and others). You might also want to do something like http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html On 3 August 2017 at 12:01, pybokeh wrote: > Hello, > I am studying this example from scikit-learn's site: > http://scikit-

[scikit-learn] Help With Text Classification

2017-08-02 Thread pybokeh
Hello, I am studying this example from scikit-learn's site: http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_ data.html The problem that I need to solve is very similar to this example, except I have one additional feature column (part #) that is categorical of type string.

Re: [scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer

2017-08-02 Thread Sam Barnett
Hi Andy, The purpose of the transformer is to take an ordinary kernel (in this case I have taken 'rbf' as a default) and return a 'sequentialised' kernel using a few extra parameters. Hence, the transformer takes an ordinary data-target pair X, y as its input, and the fit_transform(X, y) method wi

Re: [scikit-learn] minibatchkmeans deprecation warning?

2017-08-02 Thread Andreas Mueller
The docs say 3 month, I think. Though it's been more like 8. 0.19 will come out in August. On 08/02/2017 12:48 PM, Chris Carrion via scikit-learn wrote: Before I forget, is there an ETA for .19, or an average time between upgrades? *From: *Andreas Mueller *Sent: *W

Re: [scikit-learn] minibatchkmeans deprecation warning?

2017-08-02 Thread Chris Carrion via scikit-learn
Before I forget, is there an ETA for .19, or an average time between upgrades? From: Andreas Mueller Sent: Wednesday, August 2, 2017 12:34 PM To: Chris Carrion via scikit-learn Subject: Re: [scikit-learn] minibatchkmeans deprecation warning? Ah. That's actually a deprecation warning coming from

Re: [scikit-learn] minibatchkmeans deprecation warning?

2017-08-02 Thread Chris Carrion via scikit-learn
That’s great to hear, thanks! Chris From: Andreas Mueller Sent: Wednesday, August 2, 2017 12:34 PM To: Chris Carrion via scikit-learn Subject: Re: [scikit-learn] minibatchkmeans deprecation warning? Ah. That's actually a deprecation warning coming from numpy, and it think it'll be removed in 0.

Re: [scikit-learn] minibatchkmeans deprecation warning?

2017-08-02 Thread Andreas Mueller
Ah. That's actually a deprecation warning coming from numpy, and it think it'll be removed in 0.19 (if not already in 0.18.1). It's really nothing to worry about, though. Andy On 08/02/2017 12:10 PM, Chris Carrion via scikit-learn wrote: Hi Andy, WARNsklearn/cluster/k_means_.py:1301: Depre

Re: [scikit-learn] minibatchkmeans deprecation warning?

2017-08-02 Thread Chris Carrion via scikit-learn
Hi Andy, WARN sklearn/cluster/k_means_.py:1301: DeprecationWarning: This function is deprecated. Please call randint(0, 179 + 1) instead That’s all I’m given From: Andreas Mueller Sent: Wednesday, August 2, 2017 12:09 PM To: Chris Carrion via scikit-learn Subject: Re: [scikit-learn] minibatchkmea

Re: [scikit-learn] minibatchkmeans deprecation warning?

2017-08-02 Thread Andreas Mueller
Hi Chris. What is the warning? Andy On 08/02/2017 11:36 AM, Chris Carrion via scikit-learn wrote: Hi, I’m working in an environment provided by Quantopian, an algorithmic-traders hub for research. I imported the minibatch kmeans from sklearn.clusters in the environment they provided, but I

Re: [scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer

2017-08-02 Thread Andreas Mueller
Hi Sam. GridSearchCV will do cross-validation, which requires to "transform" the test data. The shape of the test-data will be different from the shape of the training data. You need to have the ability to compute the kernel between the training data and new test data. A more hacky solution w

[scikit-learn] minibatchkmeans deprecation warning?

2017-08-02 Thread Chris Carrion via scikit-learn
Hi, I’m working in an environment provided by Quantopian, an algorithmic-traders hub for research. I imported the minibatch kmeans from sklearn.clusters in the environment they provided, but I’m getting a deprecation warning. After reaching out to Quantopian support, they claim it’s something w