Re: How to use registered Hive UDF in Spark DataFrame?

Umesh Kacha Sun, 04 Oct 2015 10:24:05 -0700

Hi I tried to use callUDF in the following way it throws exception saying
cant recognise myUDF even though I registered it.


List<Column> colList = new ArrayList<Column>();
colSeq.add(col("myColumn").as("modifiedColumn"));
Seq<Column> colSeq = JavaConversions.asScalaBuffer(colList);//I need to do
this because the following call wont accept just one col() it needs
Seq<Column>
DataFrame resultFrame =
sourceFrame.select(callUDF("MyUDF").toString(),colSeq);

Above call fails saying cant recognise ''MyUDF myColumn as modifiedColumn'
in given columns bla bla...

On Sat, Oct 3, 2015 at 2:36 AM, Michael Armbrust <mich...@databricks.com>
wrote:

> callUDF("MyUDF", col("col1").as("name")
>
> or
>
> callUDF("MyUDF", col("col1").alias("name")
>
> On Fri, Oct 2, 2015 at 3:29 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:
>
>> Hi Michael,
>>
>> Thanks much. How do we give alias name for resultant columns? For e.g.
>> when using
>>
>> hiveContext.sql("select MyUDF("test") as mytest from myTable");
>>
>> how do we do that in DataFrame callUDF
>>
>> callUDF("MyUDF", col("col1"))???
>>
>> On Fri, Oct 2, 2015 at 8:23 PM, Michael Armbrust <mich...@databricks.com>
>> wrote:
>>
>>> import org.apache.spark.sql.functions.*
>>>
>>> callUDF("MyUDF", col("col1"), col("col2"))
>>>
>>> On Fri, Oct 2, 2015 at 6:25 AM, unk1102 <umesh.ka...@gmail.com> wrote:
>>>
>>>> Hi I have registed my hive UDF using the following code:
>>>>
>>>> hiveContext.udf().register("MyUDF",new UDF1(String,String)) {
>>>> public String call(String o) throws Execption {
>>>> //bla bla
>>>> }
>>>> },DataTypes.String);
>>>>
>>>> Now I want to use above MyUDF in DataFrame. How do we use it? I know
>>>> how to
>>>> use it in a sql and it works fine
>>>>
>>>> hiveContext.sql(select MyUDF("test") from myTable);
>>>>
>>>> My hiveContext.sql() query involves group by on multiple columns so for
>>>> scaling purpose I am trying to convert this query into DataFrame APIs
>>>>
>>>>
>>>> dataframe.select("col1","col2","coln").groupby(""col1","col2","coln").count();
>>>>
>>>> Can we do the follwing dataframe.select(MyUDF("col1"))??? Please guide.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-registered-Hive-UDF-in-Spark-DataFrame-tp24907.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: How to use registered Hive UDF in Spark DataFrame?

Reply via email to