Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread amueller
did you see my earlier reply? Roman Sinayev schrieb: >min_df=2 in the second and min_df=1 in the first. > >On Thu, Mar 14, 2013 at 7:19 PM, Ark <4rk@gmail.com> wrote: >> >>> >>> This is unexpected. Can you inspect the vocabulary_ on both >>> vectorizers? Try computing their set.intersectio

Re: [Scikit-learn-general] sdss_photoz NaN problem in Exercise 7.2

2013-03-14 Thread Brian Holt
Up until very recently I was working on windows 7 64bit without any trouble. Are you using the Enthought Python Distribution or pythonxy or are you building scikit learn for yourself? On Mar 14, 2013 9:46 PM, "george manus" wrote: > > > Leon Palafox writes: > > > > > > > What is the issue you'v

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread Roman Sinayev
min_df=2 in the second and min_df=1 in the first. On Thu, Mar 14, 2013 at 7:19 PM, Ark <4rk@gmail.com> wrote: > >> >> This is unexpected. Can you inspect the vocabulary_ on both >> vectorizers? Try computing their set.intersection, set.difference, >> set.symmetric_difference (all Python builti

[Scikit-learn-general] Tutorial Exercise 7.3 file loading problem

2013-03-14 Thread george manus
As I am waiting for advisory on resolving clf.fit data containing 'nan' issue in Exercise 7.2, I am moving on to Exercise 7.3. Now I got a problem with numpy.load: data = np.load(os.path.join(DATA_HOME, 'spec4000_corrected.npz')) - giving me: Bad magic number for central directory I verified

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread Ark
> > This is unexpected. Can you inspect the vocabulary_ on both > vectorizers? Try computing their set.intersection, set.difference, > set.symmetric_difference (all Python builtins). > In [17]: len(set.symmetric_difference(set(vect13.vocabulary_.keys()), set(vect14.vocabulary_.keys( Out[17

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Andreas Mueller
On 03/14/2013 05:15 PM, Bertrand Thirion wrote: >> It would be nice to clarify if INRIA is still paying an engineer for >> the project and if so whether he's full-time or has other duties. > > Yes, INRIA has been paying a full time engineer (Fabian, then Jaques) on the > project since January 2010

Re: [Scikit-learn-general] Relational learning with scikit-learn

2013-03-14 Thread Andreas Mueller
On 03/14/2013 10:43 PM, Tom Fawcett wrote: > Just curious – is anyone working on relational learning (also called > multi-relational learning) with scikit-learn? For those who are unfamiliar, > this is basically using sets of related tables with table entries allowed to > point to other table r

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Andreas Mueller
On 03/14/2013 10:38 PM, Robert Layton wrote: > > > I had a feeling that I would get this part wrong, as it was the part I > know the least about. > > I didn't want to distract from the community, but wanted to point out > that real institutions are putting real money into the project, in > much

Re: [Scikit-learn-general] sdss_photoz NaN problem in Exercise 7.2

2013-03-14 Thread george manus
Leon Palafox writes: > > > What is the issue you've been having? > > On Thu, Mar 14, 2013 at 12:44 PM, george manus wrote:Hi There, > I am going through your great tutorial and got stuck with this data problem when > trying to do clf.fit in DecisionTreeRegression exercise.  After googling,

[Scikit-learn-general] Relational learning with scikit-learn

2013-03-14 Thread Tom Fawcett
Just curious – is anyone working on relational learning (also called multi-relational learning) with scikit-learn? For those who are unfamiliar, this is basically using sets of related tables with table entries allowed to point to other table rows, etc. Basic relational database stuff. I’d

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Robert Layton
On 15 March 2013 03:15, Bertrand Thirion wrote: > > It would be nice to clarify if INRIA is still paying an engineer for > > the project and if so whether he's full-time or has other duties. > > > Yes, INRIA has been paying a full time engineer (Fabian, then Jaques) on > the project since January

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-14 Thread Ronnie Ghose
I think anything is ok ? On Thu, Mar 14, 2013 at 5:33 PM, Albert Kottke wrote: > Sure. It might take me a little time to put together. I could do a > collection of CSV files, or just do a gzipped json file of a Python > dict. > > Albert > > On Thu, Mar 14, 2013 at 2:29 PM, Ronnie Ghose > w

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-14 Thread Albert Kottke
Sure. It might take me a little time to put together. I could do a collection of CSV files, or just do a gzipped json file of a Python dict. Albert On Thu, Mar 14, 2013 at 2:29 PM, Ronnie Ghose wrote: > Could you release a part of your data? / Similar data? > > > On Thu, Mar 14, 2013 at 5:26 PM

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-14 Thread Ronnie Ghose
Could you release a part of your data? / Similar data? On Thu, Mar 14, 2013 at 5:26 PM, Albert Kottke wrote: > I sent this email earlier, but that attachment exceeded the attachment > limit so I am linking the attachment. > > http://i.imgur.com/dHida3t.png > > Attached is a figure showing a coll

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-14 Thread Albert Kottke
I sent this email earlier, but that attachment exceeded the attachment limit so I am linking the attachment. http://i.imgur.com/dHida3t.png Attached is a figure showing a collection of velocity profiles across a region. The goal would be to group each of theses curves into groups with similar ch

Re: [Scikit-learn-general] sdss_photoz NaN problem in Exercise 7.2

2013-03-14 Thread Leon Palafox
What is the issue you've been having? On Thu, Mar 14, 2013 at 12:44 PM, george manus wrote: > Hi There, > I am going through your great tutorial and got stuck with this data > problem when > trying to do clf.fit in DecisionTreeRegression exercise. After googling, > it > appears this problem wa

[Scikit-learn-general] sdss_photoz NaN problem in Exercise 7.2

2013-03-14 Thread george manus
Hi There, I am going through your great tutorial and got stuck with this data problem when trying to do clf.fit in DecisionTreeRegression exercise. After googling, it appears this problem was reported and confirmed for windows usage (I am using python 2.7.3, numpy 1.6.1 on win7 64 bit). I am

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-14 Thread Ronnie Ghose
What would be wrong with clustering via the thickness and velocity of each layer as features? On Mar 14, 2013 3:31 PM, "Albert Kottke" wrote: > I am a novice at machine learning, so pardon my ignorance. > > I have ~3000 velocity profiles, which consist of multiple layers defined > by a thickness

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-14 Thread Leon Palafox
Hi there How does your data set looks like, each time that you have a measurement, which kind of information do you have available? Leon On Thu, Mar 14, 2013 at 12:30 PM, Albert Kottke wrote: > I am a novice at machine learning, so pardon my ignorance. > > I have ~3000 velocity profiles, which

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-14 Thread Ronnie Ghose
Elaborate - sounds like two features as of now On Mar 14, 2013 3:31 PM, "Albert Kottke" wrote: > I am a novice at machine learning, so pardon my ignorance. > > I have ~3000 velocity profiles, which consist of multiple layers defined > by a thickness and velocity. My goal is cluster these profile

[Scikit-learn-general] Machine learning on 2D problems.

2013-03-14 Thread Albert Kottke
I am a novice at machine learning, so pardon my ignorance. I have ~3000 velocity profiles, which consist of multiple layers defined by a thickness and velocity. My goal is cluster these profiles into ~10 groups with similar characteristics. All of the examples that I have seen on the scikits-lea

Re: [Scikit-learn-general] macports installation instructions

2013-03-14 Thread John Gleeson
On 2013-03-13, at 9:12 AM, Lars Buitinck wrote: > 2013/3/13 John Gleeson : >>> sudo port install py27-scikits-learn >> >> This is the correct install command. > > I just updated the docs. Apparently the name got mistyped py27-sklearn > once, then the error was copied. > > (The MacPorts package is

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Bertrand Thirion
> It would be nice to clarify if INRIA is still paying an engineer for > the project and if so whether he's full-time or has other duties. Yes, INRIA has been paying a full time engineer (Fabian, then Jaques) on the project since January 2010. Bertrand -

Re: [Scikit-learn-general] Gaussian process regression examples are strangely statefull

2013-03-14 Thread Jaques Grobler
Weird, I can actually reproduce this.. Running example as is, it's fine.. but if I separately run the two parts, I get: http://oi45.tinypic.com/1gnc5h.jpg I'll have a look -- Everyone hates slow websites. So do we. Make yo

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Mathieu Blondel
On Thu, Mar 14, 2013 at 9:55 PM, Bertrand Thirion wrote: >It is true that singling out INRIA gives a distorted view on the set of > scikit contributors. > But it is also true that -as explained by Nelle- INRIA has been paying people > to work on the project only for more than three years n

Re: [Scikit-learn-general] Macports installation issue

2013-03-14 Thread Thomas Fawcett
> > Message: 5 > Date: Wed, 13 Mar 2013 12:53:31 -0600 > From: John Gleeson > Subject: Re: [Scikit-learn-general] macports installation instructions > To: scikit-learn-general@lists.sourceforge.net > Message-ID: > Content-Type: text/plain; charset=US-ASCII; format=flowed > > > On 2013-03-13,

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Mathieu Blondel
Hi Robert, When you write that according to the survey, users want "sparse matrix support", I would rather write "Sparse matrix support in more estimators / algorithms". My 2c, Mathieu On Thu, Mar 14, 2013 at 1:52 PM, Robert Layton wrote: > On 1 March 2013 00:57, Olivier Grisel wrote: >> >> 20

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Andreas Mueller
On 03/14/2013 01:38 PM, Nelle Varoquaux wrote: On 14 March 2013 13:30, Andreas Mueller > wrote: Btw, don't get me wrong, I think it is great what INRIA and Gael's team to. But singling them out seems weird to me. I think INRIA has played an impo

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Bertrand Thirion
- Mail original - > De: "Andreas Mueller" > À: scikit-learn-general@lists.sourceforge.net > Envoyé: Jeudi 14 Mars 2013 13:30:34 > Objet: Re: [Scikit-learn-general] PyCON Australia 2013 > > Btw, don't get me wrong, I think it is great what INRIA and Gael's > team to. > But singling them

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Nelle Varoquaux
On 14 March 2013 13:30, Andreas Mueller wrote: > Btw, don't get me wrong, I think it is great what INRIA and Gael's team to. > But singling them out seems weird to me. > I think INRIA has played an important role in the "rebirth" of the scikit (without INRIA's interest and financial support, the

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Andreas Mueller
Btw, don't get me wrong, I think it is great what INRIA and Gael's team to. But singling them out seems weird to me. -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamic

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Andreas Mueller
On 03/14/2013 01:22 PM, Nelle Varoquaux wrote: On 14 March 2013 13:19, Andreas Mueller > wrote: On 03/14/2013 05:52 AM, Robert Layton wrote: > > Attached is a draft of my presentation. I think I'll add some more > detail, but I'm not sure yet

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Nelle Varoquaux
On 14 March 2013 13:22, Nelle Varoquaux wrote: > > > > On 14 March 2013 13:19, Andreas Mueller wrote: > >> >> On 03/14/2013 05:52 AM, Robert Layton wrote: >> > >> > Attached is a draft of my presentation. I think I'll add some more >> > detail, but I'm not sure yet where to focus -- I'll be aimi

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Nelle Varoquaux
On 14 March 2013 13:19, Andreas Mueller wrote: > > On 03/14/2013 05:52 AM, Robert Layton wrote: > > > > Attached is a draft of my presentation. I think I'll add some more > > detail, but I'm not sure yet where to focus -- I'll be aiming for a 30 > > minute presentation, so I don't want to add too

Re: [Scikit-learn-general] PyCON Australia 2013

2013-03-14 Thread Andreas Mueller
On 03/14/2013 05:52 AM, Robert Layton wrote: > > Attached is a draft of my presentation. I think I'll add some more > detail, but I'm not sure yet where to focus -- I'll be aiming for a 30 > minute presentation, so I don't want to add too much detail. > > I'm not sure if the mailing list will al

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread Lars Buitinck
2013/3/14 Ark <4rk@gmail.com>: > For: > vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2), > smooth_idf=True, sublinear_tf=True, max_df=0.5, > token_pattern=ur'\b(?!\d)\w\w+\b')) > > On fit_transform the shape of the input data > - with version 0.13.

[Scikit-learn-general] Gaussian process regression examples are strangely statefull

2013-03-14 Thread William Furnass
If one runs the noisy example in the basic Gaussian Process regression example code [1] _before_ the non-noisy example then the noisy example prediction is just a straight line. There seems to be something oddly statefull about the GaussianProcess class. Any thoughts on why? Cheers, Will [1]

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread Andreas Mueller
This is weird. Are you sure it is not the other way around? The min_df parameter was reset from 2 to 1 afaik, which should give you a larger vocabulary in the git version, not a smaller. On 03/14/2013 04:11 AM, Ark wrote: > The vectorized input with the same training data set differs with versio