Andrew Davidson created SPARK-12606: ---------------------------------------
Summary: Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ? Key: SPARK-12606 URL: https://issues.apache.org/jira/browse/SPARK-12606 Project: Spark Issue Type: Bug Components: ML Affects Versions: 1.5.2 Environment: Java 8, Mac OS, Spark-1.5.2 Reporter: Andrew Davidson Hi Andy, I suspect that you hit the Scala/Java compatibility issue, I can also reproduce this issue, so could you file a JIRA to track this issue? Yanbo 2016-01-02 3:38 GMT+08:00 Andy Davidson <a...@santacruzintegration.com>: I am trying to write a trivial transformer I use use in my pipeline. I am using java and spark 1.5.2. It was suggested that I use the Tokenize.scala class as an example. This should be very easy how ever I do not understand Scala, I am having trouble debugging the following exception. Any help would be greatly appreciated. Happy New Year Andy java.lang.IllegalArgumentException: requirement failed: Param null__inputCol does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557) at org.apache.spark.ml.param.Params$class.set(params.scala:436) at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) at org.apache.spark.ml.param.Params$class.set(params.scala:422) at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) at org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83) at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30) public class StemmerTest extends AbstractSparkTest { @Test public void test() { Stemmer stemmer = new Stemmer() .setInputCol("raw”) //line 30 .setOutputCol("filtered"); } } /** * @ see spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala * @ see https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/ * @ see http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/ * * @author andrewdavidson * */ public class Stemmer extends UnaryTransformer<List<String>, List<String>, Stemmer> implements Serializable{ static Logger logger = LoggerFactory.getLogger(Stemmer.class); private static final long serialVersionUID = 1L; private static final ArrayType inputType = DataTypes.createArrayType(DataTypes.StringType, true); private final String uid = Stemmer.class.getSimpleName() + "_" + UUID.randomUUID().toString(); @Override public String uid() { return uid; } /* override protected def validateInputType(inputType: DataType): Unit = { require(inputType == StringType, s"Input type must be string type but got $inputType.") } */ @Override public void validateInputType(DataType inputTypeArg) { String msg = "inputType must be " + inputType.simpleString() + " but got " + inputTypeArg.simpleString(); assert (inputType.equals(inputTypeArg)) : msg; } @Override public Function1<List<String>, List<String>> createTransformFunc() { // http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters Function1<List<String>, List<String>> f = new AbstractFunction1<List<String>, List<String>>() { public List<String> apply(List<String> words) { for(String word : words) { logger.error("AEDWIP input word: {}", word); } return words; } }; return f; } @Override public DataType outputDataType() { return DataTypes.createArrayType(DataTypes.StringType, true); } } -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org