hi,all
     I define a udf with multiple parameters  ,but I don't know how to call it 
with DataFrame
 
UDF:

def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, 
minTermLen: Int) =>
    val terms = HanLP.segment(sentence).asScala
.....

Call :

scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
<console>:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                 ^
<console>:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                      ^
<console>:40: error: type mismatch;
 found   : Int(2)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                           ^

scala> val output = 
input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input 
columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
   +- LocalRelation [_1#2, _2#3]


I need help!!


2017-06-16


lk_spark 

Reply via email to