Hello Experts,

I am facing performance problem when I use the UDF function call. Please help 
me to tune the query.
Please find the details below

shark> select count(*) from table1;
OK
151096
Time taken: 7.242 seconds
shark> select count(*) from table2; 
OK
938
Time taken: 1.273 seconds

Without UDF:
shark> SELECT
     >         count(pvc1.time)
     >   FROM table2 pvc2 JOIN table1 pvc1
     >   WHERE pvc1.col1 = pvc2.col2
     >   AND  unix_timestamp(pvc2.time, 'yyyy-MM-dd HH:mm:ss,SSS') > 
unix_timestamp(pvc1.time, 'yyyy-MM-dd HH:mm:ss,SSS');
OK
328
Time taken: 200.487 seconds


shark> 
     > SELECT
     >         count(pvc1.time)
     >   FROM table2 pvc2 JOIN table1 pvc1
     >   WHERE (pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2)
     >   AND  unix_timestamp(pvc2.time, 'yyyy-MM-dd HH:mm:ss,SSS') > 
unix_timestamp(pvc1.time, 'yyyy-MM-dd HH:mm:ss,SSS');
OK
331
Time taken: 292.86 seconds

With UDF:
shark>  
     >  SELECT
     >         count(pvc1.time)
     >   FROM table2 pvc2 JOIN table1 pvc1
     >   WHERE testCompare(pvc1.col1,pvc1.col2, pvc2.col1,pvc2.col2)
     >   AND  unix_timestamp(pvc2.time, 'yyyy-MM-dd HH:mm:ss,SSS') > 
unix_timestamp(pvc1.time, 'yyyy-MM-dd HH:mm:ss,SSS');

OK
331
Time taken: 3718.23 seconds

The above UDF query takes more time to run. 


Where testCompare is an udf function, The function just does the pvc1.col1 = 
pvc2.col1 OR pvc1.col1 = pvc2.col2

Please let me know what is the issue here?

 
Thanks and Regards,
Sankar S.  

Reply via email to