In your case, I would suggest you to extends UnaryTransformer which is much easier.
Yeah, I have to admit that there's no document about how to write a custom Transformer, I think we need to add that, since writing custom Transformer is a very typical work in machine learning. On Tue, Dec 22, 2015 at 9:54 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > > I am trying to port the following python function to Java 8. I would like > my java implementation to implement Transformer so I can use it in a > pipeline. > > I am having a heck of a time trying to figure out how to create a Column > variable I can pass to DataFrame.withColumn(). As far as I know > withColumn() the only way to append a column to a data frame. > > Any comments or suggestions would be greatly appreciated > > Andy > > > def convertMultinomialLabelToBinary(dataFrame): > newColName = "binomialLabel" > binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else > “signal", StringType()) > ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"])) > return ret > trainingDF2 = convertMultinomialLabelToBinary(trainingDF1) > > > > public class LabelToBinaryTransformer extends Transformer { > > private static final long serialVersionUID = 4202800448830968904L; > > private final UUID uid = UUID.randomUUID(); > > public String inputCol; > > public String outputCol; > > > > @Override > > public String uid() { > > return uid.toString(); > > } > > > @Override > > public Transformer copy(ParamMap pm) { > > Params xx = defaultCopy(pm); > > return ???; > > } > > > @Override > > public DataFrame transform(DataFrame df) { > > MyUDF myUDF = new MyUDF(myUDF, null, null); > > Column c = df.col(inputCol); > > ??? UDF apply does not take a col???? > > Column col = myUDF.apply(df.col(inputCol)); > > DataFrame ret = df.withColumn(outputCol, col); > > return ret; > > } > > > @Override > > public StructType transformSchema(StructType arg0) { > > *??? What is this function supposed to do???* > > ???Is this the type of the new output column???? > > } > > > > class MyUDF extends UserDefinedFunction { > > public MyUDF(Object f, DataType dataType, Seq<DataType> inputTypes) > { > > super(f, dataType, inputTypes); > > ??? Why do I have to implement this constructor ??? > > ??? What are the arguments ??? > > } > > > > @Override > > public > > Column apply(scala.collection.Seq<Column> exprs) { > > What do you do with a scala seq? > > return ???; > > } > > } > > } > > > -- Best Regards Jeff Zhang