Re: [scikit-learn] Suggestion to update the code for Segmenting the picture of Lena in regions

2018-07-28 Thread Jacob Vanderplas
Hi Rajkiran, It sounds like you found an example from an old version of the scikit-learn documentation. After scipy removed that image, the example you're referring to was updated to this one: http://scikit-learn.org/stable/auto_examples/cluster/plot_face_segmentation.html Best, Jake Jake Va

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-02 Thread Jacob Vanderplas
gh controls are matched to many different cases so that each > case ends up being matched to 20 unique controls. Does this method make > sense?? > > Best, > > Randy > > On Sun, Apr 1, 2018 at 10:13 PM, Jacob Vanderplas < > jake...@cs.washington.edu> wrote: > >

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-01 Thread Jacob Vanderplas
On Sun, Apr 1, 2018 at 6:36 PM, Randy Ellis wrote: > Hello to the Scikit-learn community! > > I am doing case-control matching for an electronic health records study. > My question is, is it possible to run Sklearn's NearestNeighbors function > without replacement? As in, match the treated group

Re: [scikit-learn] CountVectorizer: Additional Feature Suggestion

2018-01-27 Thread Jacob Vanderplas
Hi Yacine, If I'm understanding you correctly, I think what you have in mind is already implemented in scikit-learn in the TF-IDF vectorizer . Best, Jake Jake VanderPlas Senior Data Scienc

Re: [scikit-learn] Support Vector Machines: Sensitive to Single Datapoints?

2017-12-19 Thread Jacob Vanderplas
Hi JohnMark, SVMs, by design, are quite sensitive to the addition of single data points – but only if those data points happen to lie near the margin. I wrote about some of those types of details here: https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html Hope tha

Re: [scikit-learn] Nearest neighbor search with 2 distance measures

2017-08-01 Thread Jacob Vanderplas
RAN. The closest one was halotools which again works with > euclidean metric. For now, I will try to get my work done with 2 different > BallTrees iteratively in bins. If I find a better option will try to post > an update. > > Regards, > Rohin. > > > On Tue, Aug 1, 2017 at

Re: [scikit-learn] Nearest neighbor search with 2 distance measures

2017-08-01 Thread Jacob Vanderplas
in boxes (using >> two arrays simultaneously-hence needing 2 metrics) instead of one distance >> array as the binning parameter. I don't know if the algorithm supports such >> a thing. For now, I am proceeding with your suggestion of two ball trees at >> huge computat

Re: [scikit-learn] Nearest neighbor search with 2 distance measures

2017-07-31 Thread Jacob Vanderplas
On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar wrote: > *update* > > May be it doesn't have to be done at the tree creation level. It could be > using loops and creating two different balltrees. Something like > > tree1=BallTree(X,metric='metric1') #for x-z plane > tree2=BallTree(X,metric='metric2

Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn

2017-02-03 Thread Jacob Vanderplas
Hi Afarin, The short answer is no, you can't really compute p-values and related statistics in Scikit-Learn. This stems from a fundamental divide in statistics/AI between machine learning on one hand, and statistical modeling on the other. A classic treatment of this divide is "Statistical Modelin

Re: [scikit-learn] Problems with plotting decision regions

2016-09-13 Thread Jacob Vanderplas
It seems to work correctly if you replace the colormap with a continuous one like 'viridis'. I suspect this is a bug in matplotlib's ListedColormap, Jake Jake VanderPlas Senior Data Science Fellow Director of Research in Physical Sciences University of Washington eScience Institute On Tue,