Re: slow performance when using udf

wd Mon, 15 Aug 2011 19:47:49 -0700

Thanks for all your advise, I'll try it out.


On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
>
>
> On Monday, August 15, 2011, Carl Steinbach <c...@cloudera.com> wrote:
>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
>> should help some with performance.
>> On Mon, Aug 15, 2011 at 1:49 AM, wd <w...@wdicc.com> wrote:
>>>
>>> hi,
>>>
>>> I create a udf to decode urlencoded things, but found the speed for
>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>>
>>> package com.test.hive.udf;
>>>
>>> import org.apache.hadoop.hive.ql.exec.UDF;
>>> import java.net.URLDecoder;
>>>
>>> public final class urldecode extends UDF {
>>>
>>>    public String evaluate(final String s) {
>>>        if (s == null) { return null; }
>>>        return getString(s);
>>>    }
>>>
>>>    public static String getString(String s) {
>>>        String a;
>>>        try {
>>>            a = URLDecoder.decode(s);
>>>        } catch ( Exception e) {
>>>            a = "";
>>>        }
>>>        return a;
>>>    }
>>>
>>>    public static void main(String args[]) {
>>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>>        System.out.println( getString(t) );
>>>    }
>>> }
>>
>>
>
> Also you should use class level privatete members to save on object
> incantation and garbage collection.
>
> You also get benefits by matching the args with what you would normally
> expect from upstream. Hive converts text to string when needed, but if the
> data normally coming into the method is text you could try and match the
> argument and see if it is any faster.

Re: slow performance when using udf

Reply via email to