In your case, I would suggest you to extends UnaryTransformer which is much
easier.

Yeah, I have to admit that there's no document about how to write a custom
Transformer, I think we need to add that, since writing custom Transformer
is a very typical work in machine learning.

On Tue, Dec 22, 2015 at 9:54 AM, Andy Davidson <
a...@santacruzintegration.com> wrote:

>
> I am trying to port the following python function to Java 8. I would like
> my java implementation to implement Transformer so I can use it in a
> pipeline.
>
> I am having a heck of a time trying to figure out how to create a Column
> variable I can pass to DataFrame.withColumn(). As far as I know
> withColumn() the only way to append a column to a data frame.
>
> Any comments or suggestions would be greatly appreciated
>
> Andy
>
>
> def convertMultinomialLabelToBinary(dataFrame):
>     newColName = "binomialLabel"
>     binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else 
> “signal", StringType())
>     ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))
>     return ret
> trainingDF2 = convertMultinomialLabelToBinary(trainingDF1)
>
>
>
> public class LabelToBinaryTransformer extends Transformer {
>
>     private static final long serialVersionUID = 4202800448830968904L;
>
>     private  final UUID uid = UUID.randomUUID();
>
>     public String inputCol;
>
>     public String outputCol;
>
>
>
>     @Override
>
>     public String uid() {
>
>         return uid.toString();
>
>     }
>
>
>     @Override
>
>     public Transformer copy(ParamMap pm) {
>
>         Params xx = defaultCopy(pm);
>
>         return ???;
>
>     }
>
>
>     @Override
>
>     public DataFrame transform(DataFrame df) {
>
>         MyUDF myUDF = new MyUDF(myUDF, null, null);
>
>         Column c = df.col(inputCol);
>
> ??? UDF apply does not take a col????
>
>         Column col = myUDF.apply(df.col(inputCol));
>
>         DataFrame ret = df.withColumn(outputCol, col);
>
>         return ret;
>
>     }
>
>
>     @Override
>
>     public StructType transformSchema(StructType arg0) {
>
>        *??? What is this function supposed to do???*
>
>       ???Is this the type of the new output column????
>
>     }
>
>
>
>     class MyUDF extends UserDefinedFunction {
>
>         public MyUDF(Object f, DataType dataType, Seq<DataType> inputTypes)
> {
>
>             super(f, dataType, inputTypes);
>
>             ??? Why do I have to implement this constructor ???
>
>     ??? What are the arguments ???
>
>         }
>
>
>
>         @Override
>
>         public
>
>         Column apply(scala.collection.Seq<Column> exprs) {
>
>     What do you do with a scala seq?
>
>             return ???;
>
>         }
>
>     }
>
> }
>
>
>


-- 
Best Regards

Jeff Zhang

Reply via email to