OK, it's not class imbalance. Yes, 100 iterations. My other guess is that the stepSize of 1 is way too big for your data.
I'd suggest you look at the weights / intercept of the resulting model to see if it makes any sense. You can call clearThreshold on the model, and then it will 'predict' the SVM margin instead of a class. That could at least tell you whether it's predicting the same value over and over or just lots of very big values. On Wed, Nov 12, 2014 at 6:02 PM, Caron <caron.big...@gmail.com> wrote: > Sean, > > Thanks a lot for your reply! > > A few follow up questions: > 1. numIterations should be 100, not 100*trainingSetSize, right? > 2. My training set has 90k positive data points (with label 1) and 60k > negative data points (with label 0). > I set my numIterations to 100 as default. I still got the same predication > result: it all predicted to label 1. > And I'm sure my dataset is linearly separable because it has been run on > other frameworks like scikit-learn. > > // code > val numIterations = 100; > val regParam = 1 > val svm = new SVMWithSGD() > svm.optimizer.setNumIterations(numIterations).setRegParam(regParam) > svm.setIntercept(true) > val model = svm.run(training) > > > > > > > > > ----- > Thanks! > -Caron > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SVMWithSGD-default-threshold-tp18645p18741.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >