I am testing decision tree using iris.scale data set 
(http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#iris)
In the data set there are three class labels 1, 2, and 3. However in the 
following code, I have to make numClasses = 4. I will get an 
ArrayIndexOutOfBound Exception if I make the numClasses = 3. Why?

    var conf = new SparkConf().setAppName("DecisionTree")
    var sc = new SparkContext(conf)

    val data = MLUtils.loadLibSVMFile(sc,"data/iris.scale.txt");
    val numClasses = 4;
    val categoricalFeaturesInfo = Map[Int,Int]();
    val impurity = "gini";
    val maxDepth = 5;
    val maxBins = 100;

    val model = DecisionTree.trainClassifier(data, numClasses, 
categoricalFeaturesInfo, impurity, maxDepth, maxBins);

    val labelAndPreds = data.map{ point =>
      val prediction = model.predict(point.features);
      (point.label, prediction)
    }

    val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / 
data.count;
    println("Training Error = " + trainErr);
    println("Learned classification tree model:\n" + model);

-Yao

Reply via email to