Hi Jeff

I took a look at Tokenizer.cal, UnaryTransformer.scala, and
Transformer.scala.  How ever I can not figure out how implement
createTransformFunc() in Java 8.

It would be nice to be able to use this transformer in my pipe line but not
required. The real problem is I can not figure out how to create a Column I
can pass to dataFrame.withColumn() in my Java code. Here is my original
python
binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else
³signal", StringType())
    ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))

Any suggestions would be greatly appreciated.

Andy

public class LabelToBinaryTransformer

            extends UnaryTransformer<String, String,
LabelToBinaryTransformer> {

    private static final long serialVersionUID = 4202800448830968904L;

    private  final UUID uid = UUID.randomUUID();



    @Override

    public String uid() {

        return uid.toString();

    }



    @Override

    public Function1<String, String> createTransformFunc() {

// original python code

// binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else
³signal", StringType())

Function1 interface is not easy to implement lots of functions
        ???

    }



    @Override

    public DataType outputDataType() {

        StringType ret = new StringType();

        return ret;

    }   

    

}



From:  Jeff Zhang <zjf...@gmail.com>
Date:  Monday, December 21, 2015 at 6:43 PM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  "user @spark" <user@spark.apache.org>
Subject:  Re: trouble implementing Transformer and calling
DataFrame.withColumn()

> In your case, I would suggest you to extends UnaryTransformer which is much
> easier. 
> 
> Yeah, I have to admit that there's no document about how to write a custom
> Transformer, I think we need to add that, since writing custom Transformer is
> a very typical work in machine learning.
> 
> On Tue, Dec 22, 2015 at 9:54 AM, Andy Davidson <a...@santacruzintegration.com>
> wrote:
>> 
>> I am trying to port the following python function to Java 8. I would like my
>> java implementation to implement Transformer so I can use it in a pipeline.
>> 
>> I am having a heck of a time trying to figure out how to create a Column
>> variable I can pass to DataFrame.withColumn(). As far as I know withColumn()
>> the only way to append a column to a data frame.
>> 
>> Any comments or suggestions would be greatly appreciated
>> 
>> Andy
>> 
>> 
>> def convertMultinomialLabelToBinary(dataFrame):
>>     newColName = "binomialLabel"
>>     binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else
>> ³signal", StringType())
>>     ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))
>>     return rettrainingDF2 = convertMultinomialLabelToBinary(trainingDF1)
>> 
>> 
>> public class LabelToBinaryTransformer extends Transformer {
>> 
>>     private static final long serialVersionUID = 4202800448830968904L;
>> 
>>     private  final UUID uid = UUID.randomUUID();
>> 
>>     public String inputCol;
>> 
>>     public String outputCol;
>> 
>>     
>> 
>>     @Override
>> 
>>     public String uid() {
>> 
>>         return uid.toString();
>> 
>>     }
>> 
>> 
>> 
>>     @Override
>> 
>>     public Transformer copy(ParamMap pm) {
>> 
>>         Params xx = defaultCopy(pm);
>> 
>>         return ???;
>> 
>>     }
>> 
>> 
>> 
>>     @Override
>> 
>>     public DataFrame transform(DataFrame df) {
>> 
>>         MyUDF myUDF = new MyUDF(myUDF, null, null);
>> 
>>         Column c = df.col(inputCol);
>> 
>> ??? UDF apply does not take a col????
>> 
>>         Column col = myUDF.apply(df.col(inputCol));
>> 
>>         DataFrame ret = df.withColumn(outputCol, col);
>> 
>>         return ret;
>> 
>>     }
>> 
>> 
>> 
>>     @Override
>> 
>>     public StructType transformSchema(StructType arg0) {
>> 
>>        ??? What is this function supposed to do???
>> 
>>       ???Is this the type of the new output column????
>> 
>>     }
>> 
>>     
>> 
>>     class MyUDF extends UserDefinedFunction {
>> 
>>         public MyUDF(Object f, DataType dataType, Seq<DataType> inputTypes) {
>> 
>>             super(f, dataType, inputTypes);
>> 
>>             ??? Why do I have to implement this constructor ???
>> 
>>     ??? What are the arguments ???
>> 
>>         }
>> 
>>         
>> 
>>         @Override
>> 
>>         public
>> 
>>         Column apply(scala.collection.Seq<Column> exprs) {
>> 
>>     What do you do with a scala seq?
>> 
>>             return ???;
>> 
>>         }
>> 
>>     }
>> 
>> }
>> 
>> 
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Reply via email to