lichenglin created SPARK-15497: ---------------------------------- Summary: DecisionTreeClassificationModel can't be saved within in Pipeline caused by not implement Writable Key: SPARK-15497 URL: https://issues.apache.org/jira/browse/SPARK-15497 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.6.1 Reporter: lichenglin
Here is my code {code} SQLContext sqlContext = getSQLContext(); DataFrame data = sqlContext.read().format("libsvm").load("file:///E:/workspace-mars/bigdata/sparkjob/data/mllib/sample_libsvm_data.txt"); // Index labels, adding metadata to the label column. // Fit on whole dataset to include all labels in index. StringIndexerModel labelIndexer = new StringIndexer() .setInputCol("label") .setOutputCol("indexedLabel") .fit(data); // Automatically identify categorical features, and index them. VectorIndexerModel featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol("indexedFeatures") .setMaxCategories(4) // features with > 4 distinct values are treated as continuous .fit(data); // Split the data into training and test sets (30% held out for testing) DataFrame[] splits = data.randomSplit(new double[]{0.7, 0.3}); DataFrame trainingData = splits[0]; DataFrame testData = splits[1]; // Train a DecisionTree model. DecisionTreeClassifier dt = new DecisionTreeClassifier() .setLabelCol("indexedLabel") .setFeaturesCol("indexedFeatures"); // Convert indexed labels back to original labels. IndexToString labelConverter = new IndexToString() .setInputCol("prediction") .setOutputCol("predictedLabel") .setLabels(labelIndexer.labels()); // Chain indexers and tree in a Pipeline Pipeline pipeline = new Pipeline() .setStages(new PipelineStage[]{labelIndexer, featureIndexer, dt, labelConverter}); // Train model. This also runs the indexers. PipelineModel model = pipeline.fit(trainingData); model.save("file:///e:/tmpmodel"); {code} and here is the exception {code} Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable stage: dtc_7bdeae1c4fb8 of type class org.apache.spark.ml.classification.DecisionTreeClassificationModel at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:218) at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:215) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:215) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:325) at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:309) at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131) at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:280) at com.bjdv.spark.job.Testjob.main(Testjob.java:142) {code} sample_libsvm_data.txt is included in the 1.6.1 release tar -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org