Thanks for all your advise, I'll try it out.
On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > > > On Monday, August 15, 2011, Carl Steinbach <c...@cloudera.com> wrote: >> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) >> should help some with performance. >> On Mon, Aug 15, 2011 at 1:49 AM, wd <w...@wdicc.com> wrote: >>> >>> hi, >>> >>> I create a udf to decode urlencoded things, but found the speed for >>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it? >>> >>> package com.test.hive.udf; >>> >>> import org.apache.hadoop.hive.ql.exec.UDF; >>> import java.net.URLDecoder; >>> >>> public final class urldecode extends UDF { >>> >>> public String evaluate(final String s) { >>> if (s == null) { return null; } >>> return getString(s); >>> } >>> >>> public static String getString(String s) { >>> String a; >>> try { >>> a = URLDecoder.decode(s); >>> } catch ( Exception e) { >>> a = ""; >>> } >>> return a; >>> } >>> >>> public static void main(String args[]) { >>> String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; >>> System.out.println( getString(t) ); >>> } >>> } >> >> > > Also you should use class level privatete members to save on object > incantation and garbage collection. > > You also get benefits by matching the args with what you would normally > expect from upstream. Hive converts text to string when needed, but if the > data normally coming into the method is text you could try and match the > argument and see if it is any faster.