In your case, I would suggest you to extends UnaryTransformer which is much

Yeah, I have to admit that there's no document about how to write a custom
Transformer, I think we need to add that, since writing custom Transformer
is a very typical work in machine learning.

On Tue, Dec 22, 2015 at 9:54 AM, Andy Davidson <> wrote:

> I am trying to port the following python function to Java 8. I would like
> my java implementation to implement Transformer so I can use it in a
> pipeline.
> I am having a heck of a time trying to figure out how to create a Column
> variable I can pass to DataFrame.withColumn(). As far as I know
> withColumn() the only way to append a column to a data frame.
> Any comments or suggestions would be greatly appreciated
> Andy
> def convertMultinomialLabelToBinary(dataFrame):
>     newColName = "binomialLabel"
>     binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else 
> “signal", StringType())
>     ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))
>     return ret
> trainingDF2 = convertMultinomialLabelToBinary(trainingDF1)
> public class LabelToBinaryTransformer extends Transformer {
>     private static final long serialVersionUID = 4202800448830968904L;
>     private  final UUID uid = UUID.randomUUID();
>     public String inputCol;
>     public String outputCol;
>     @Override
>     public String uid() {
>         return uid.toString();
>     }
>     @Override
>     public Transformer copy(ParamMap pm) {
>         Params xx = defaultCopy(pm);
>         return ???;
>     }
>     @Override
>     public DataFrame transform(DataFrame df) {
>         MyUDF myUDF = new MyUDF(myUDF, null, null);
>         Column c = df.col(inputCol);
> ??? UDF apply does not take a col????
>         Column col = myUDF.apply(df.col(inputCol));
>         DataFrame ret = df.withColumn(outputCol, col);
>         return ret;
>     }
>     @Override
>     public StructType transformSchema(StructType arg0) {
>        *??? What is this function supposed to do???*
>       ???Is this the type of the new output column????
>     }
>     class MyUDF extends UserDefinedFunction {
>         public MyUDF(Object f, DataType dataType, Seq<DataType> inputTypes)
> {
>             super(f, dataType, inputTypes);
>             ??? Why do I have to implement this constructor ???
>     ??? What are the arguments ???
>         }
>         @Override
>         public
>         Column apply(scala.collection.Seq<Column> exprs) {
>     What do you do with a scala seq?
>             return ???;
>         }
>     }
> }

Best Regards

Jeff Zhang

Reply via email to