As Robin suggested, you may try the following new implementation. https://github.com/apache/spark/commit/6a827d5d1ec520f129e42c3818fe7d0d870dcbef
Thanks. Sincerely, DB Tsai ---------------------------------------------------------- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D <https://pgp.mit.edu/pks/lookup?search=0x59DF55B8AF08DF8D> On Tue, Jun 9, 2015 at 3:22 PM, Robin East <robin.e...@xense.co.uk> wrote: > Hi Stephen > > How many is a very large number of iterations? SGD is notorious for > requiring 100s or 1000s of iterations, also you may need to spend some time > tweaking the step-size. In 1.4 there is an implementation of ElasticNet > Linear Regression which is supposed to compare favourably with an > equivalent R implementation. > > On 9 Jun 2015, at 22:05, Stephen Carman <scar...@coldlight.com> wrote: > > > > Hi User group, > > > > We are using spark Linear Regression with SGD as the optimization > technique and we are achieving very sub-optimal results. > > > > Can anyone shed some light on why this implementation seems to produce > such poor results vs our own implementation? > > > > We are using a very small dataset, but we have to use a very large > number of iterations to achieve similar results to our implementation, > we’ve tried normalizing the data > > not normalizing the data and tuning every param. Our implementation is a > closed form solution so we should be guaranteed convergence but the spark > one is not, which is > > understandable, but why is it so far off? > > > > Has anyone experienced this? > > > > Steve Carman, M.S. > > Artificial Intelligence Engineer > > Coldlight-PTC > > scar...@coldlight.com > > This e-mail is intended solely for the above-mentioned recipient and it > may contain confidential or privileged information. If you have received it > in error, please notify us immediately and delete the e-mail. You must not > copy, distribute, disclose or take any action in reliance on it. In > addition, the contents of an attachment to this e-mail may contain software > viruses which could damage your own computer system. While ColdLight > Solutions, LLC has taken every reasonable precaution to minimize this risk, > we cannot accept liability for any damage which you sustain as a result of > software viruses. You should perform your own virus checks before opening > the attachment. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >