regParam=1.0 may penalize too much, because we use the average loss
instead of total loss. I just sent a PR to lower the default:
https://github.com/apache/spark/pull/3232

You can try LogisticRegressionWithLBFGS (and configure parameters
through its optimizer), which should converge faster than SGD. It uses
line search, so you don't need to worry about stepSize.

We recently added pipeline features with tuning. You can take a look
at the example code here:
https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala
. Note that the features are experimental as this is an alpha
component.

Best,
Xiangrui

On Wed, Nov 12, 2014 at 10:08 AM, Sean Owen <so...@cloudera.com> wrote:
> OK, it's not class imbalance. Yes, 100 iterations.
> My other guess is that the stepSize of 1 is way too big for your data.
>
> I'd suggest you look at the weights / intercept of the resulting model to
> see if it makes any sense.
>
> You can call clearThreshold on the model, and then it will 'predict' the SVM
> margin instead of a class. That could at least tell you whether it's
> predicting the same value over and over or just lots of very big values.
>
> On Wed, Nov 12, 2014 at 6:02 PM, Caron <caron.big...@gmail.com> wrote:
>>
>> Sean,
>>
>> Thanks a lot for your reply!
>>
>> A few follow up questions:
>> 1. numIterations should be 100, not 100*trainingSetSize, right?
>> 2. My training set has 90k positive data points (with label 1) and 60k
>> negative data points (with label 0).
>> I set my numIterations to 100 as default. I still got the same predication
>> result: it all predicted to label 1.
>> And I'm sure my dataset is linearly separable because it has been run on
>> other frameworks like scikit-learn.
>>
>> // code
>> val numIterations = 100;
>> val regParam = 1
>> val svm = new SVMWithSGD()
>> svm.optimizer.setNumIterations(numIterations).setRegParam(regParam)
>> svm.setIntercept(true)
>> val model = svm.run(training)
>>
>>
>>
>>
>>
>>
>>
>>
>> -----
>> Thanks!
>> -Caron
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SVMWithSGD-default-threshold-tp18645p18741.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to