Hello,
I am trying to run LinearRegression on a dummy data set, given below. Here
I tried all different settings but I am still failing to reproduce desired
coefficients.
Please help me out, as I facing the same problem in my actual dataset.
Thank you.
This dataset is generated based on the simple equation: Y = 4 + (2 * x1) +
(3 * x2)
*Data:*
y,x1,x2
6.3,1,0.1
8.6,2,0.2
10.9,3,0.3
13.8,4,0.6
16.4,5,0.8
19.6,6,1.2
22.8,7,1.6
25.7,8,1.9
28.3,9,2.1
31.2,10,2.4
34.1,11,2.7
*Spark Code:*
val data = sc.textFile("Data/tempData_1.csv" )
val parsedData = data.mapPartitions(_.drop(1)).map {
line =>
val parts = line.split(',')
LabeledPoint(parts(0).toDouble,Vectors.dense(Array(1.0,parts(1).toDouble,parts(2).toDouble)))
}.cache()
var numIterations = 400
val step = 0.01
val algorithm = new LinearRegressionWithSGD()
algorithm.setIntercept(false) //Even tried with intercept(True) and just
(x1,x2) features
algorithm.optimizer.setStepSize(step)
algorithm.optimizer.setNumIterations(numIterations)
.setUpdater(new SimpleUpdater())
//.setRegParam(0.1)
.setMiniBatchFraction(1.0)
val initialWeights =
Vectors.dense(Array.fill(3)(scala.util.Random.nextDouble()))
val model = algorithm.run(parsedData,initialWeights)
println(s">>>> Model intercept: ${model.intercept}, weights:
${model.weights}")
Regards,
Arun