Re: [scikit-learn] How to deal with hierarchical and real-time analysis in machine learning?

2019-02-13 Thread Max Halford
Hey lampahome, I'm currently working on an online learning library called creme: https://creme-ml.github.io/. Each estimator and transformer has a fit_one(x, y) method so that you can learn from a stream of data. I've only been working on it for a bit less than a month now but it might be of inter

[scikit-learn] cross_validate() with HMM

2019-02-13 Thread Anni Bauer
Hi! I want to be able to run each fold of a k-fold cross validation fold in parallel, using all of my 6 CPUs at once. My model is a hidden markov model and I want to train it using the training portion of the data and then extract the anomaly score (negative log-likelihood) of each test sequence

[scikit-learn] Sprint discussion points?

2019-02-13 Thread Andreas Mueller
Hey all. Should we collect some discussion points for the sprint? There's an unusual amount of core-devs present and I think we should seize the opportunity. Maybe we should create a page in the wiki or add it to the sprint page? Things that are high on my list of priorities are: * slicing

Re: [scikit-learn] Sprint discussion points?

2019-02-13 Thread Joel Nothman
Yes, I was thinking the same. I think there are some other core issues to solve, such as: * euclidean_distances numerical issues * commitment to ARM testing and debugging * logistic regression stability We should also nut out OPTICS issues or remove it from 0.21. I'm still keen on trying to work

Re: [scikit-learn] Sprint discussion points?

2019-02-13 Thread Andreas Mueller
Do you have a reference for the logistic regression stability? Is it convergence warnings? Happy to discuss the other two issues, though I feel they seem easier than most of what's on my list. I have no idea what's going on with OPTICS tbh, and I'll leave it up to you and the others to decid

Re: [scikit-learn] Sprint discussion points?

2019-02-13 Thread Joel Nothman
Convergence in logistic regression ( https://github.com/scikit-learn/scikit-learn/issues/11536) is indeed one problem (and it presents a general issue of what max_iter means when you have several solvers, or how good defaults are selected). But I was sure we had problems with non-determinism on som