Thanks for your email -- these are all great questions. 1. Efficient sparse representations are indeed important. We will soon be adding such functionality into the MLI<https://spark-project.atlassian.net/browse/MLI-8>, and these (or similar) updates will probably be made in MLlib at some point too.
2, 3. I agree that choosing the learning rate can be tricky. We chose to start with a simple learning rate in MLlib, and it should be easy to extend the MLlib code to allow for other learning rates. Moreover, simplifying the task of setting a learning rate is exactly the type of question we're aiming to answer at higher levels of MLbase. In the MLI, we're planning to implement different optimization methods and learning rate rules that are more robust. Moreover, in the ML Optimization level, we plan to automate the choice of this and other parameters completely. -Ameet On Wed, Sep 18, 2013 at 6:21 PM, Jianmin Wu <[email protected]> wrote: > Hi all, > I read the Logistic Regression(LR) implementation in Spark and got several > questions. Could anyone here give some explanation? > > 1. The implementation is for dense representation of the feature > vectors. But the feature vector is highly sparse in most of the case. So > any plan on a version for sparse feature vector? Or any reason to do so > intentionally? > 2. Any experiments data exists for the convergence performance? The > setting of learning rate is tricky, we see a fairly straightforward > learning rate update rule in current implementation. > 3. Any research work for the practical learning rate setting? As a > matter of fact, I implemented a python version of LR with stochastic > gradient descent method for sparse feature vector in Spark, and am facing > some convergence issue. I failed to get some clues in Tong's work "Solving > Large Scale Linear Prediction Problems Using Stochastic Gradient Descent > Algorithms" and some related papers like "Pegasos: Primal Estimated > sub-Gradient solver for SVM". > > Any suggestions and explanations are appreciated. > > Thanks in advance, > Jianmin >
