I am testing decision tree using iris.scale data set
(http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#iris)
In the data set there are three class labels 1, 2, and 3. However in the
following code, I have to make numClasses = 4. I will get an
ArrayIndexOutOfBound Exception if I make the numClasses = 3. Why?
var conf = new SparkConf().setAppName("DecisionTree")
var sc = new SparkContext(conf)
val data = MLUtils.loadLibSVMFile(sc,"data/iris.scale.txt");
val numClasses = 4;
val categoricalFeaturesInfo = Map[Int,Int]();
val impurity = "gini";
val maxDepth = 5;
val maxBins = 100;
val model = DecisionTree.trainClassifier(data, numClasses,
categoricalFeaturesInfo, impurity, maxDepth, maxBins);
val labelAndPreds = data.map{ point =>
val prediction = model.predict(point.features);
(point.label, prediction)
}
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble /
data.count;
println("Training Error = " + trainErr);
println("Learned classification tree model:\n" + model);
-Yao