It's likely that since your UDF is a black box to hive's query optimizer that it must choose a less efficient join algorithm that passes all possible matches to your function for comparison. This will happen any time your UDF touches attributes from both sides of the join.
In general you can learn more about the chosen execution strategy by running explain. On Jul 18, 2014 12:04 PM, "S Malligarjunan" <[email protected]> wrote: > Hello Experts, > > Appreciate your input highly, please suggest/ give me hint, what would be > the issue here? > > > Thanks and Regards, > Malligarjunan S. > > > > On Thursday, 17 July 2014, 22:47, S Malligarjunan < > [email protected]> wrote: > > > Hello Experts, > > I am facing performance problem when I use the UDF function call. Please > help me to tune the query. > Please find the details below > > shark> select count(*) from table1; > OK > 151096 > Time taken: 7.242 seconds > shark> select count(*) from table2; > OK > 938 > Time taken: 1.273 seconds > > > *Without UDF:*shark> SELECT > > count(pvc1.time) > > FROM table2 pvc2 JOIN table1 pvc1 > > WHERE pvc1.col1 = pvc2.col2 > > AND unix_timestamp(pvc2.time, 'yyyy-MM-dd HH:mm:ss,SSS') > > unix_timestamp(pvc1.time, 'yyyy-MM-dd HH:mm:ss,SSS'); > OK > 328 > Time taken: 200.487 seconds > > > shark> > > SELECT > > count(pvc1.time) > > FROM table2 pvc2 JOIN table1 pvc1 > > WHERE (pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2) > > AND unix_timestamp(pvc2.time, 'yyyy-MM-dd HH:mm:ss,SSS') > > unix_timestamp(pvc1.time, 'yyyy-MM-dd HH:mm:ss,SSS'); > OK > 331 > Time taken: 292.86 seconds > > *With UDF:* > shark> > > SELECT > > count(pvc1.time) > > FROM table2 pvc2 JOIN table1 pvc1 > > WHERE testCompare(pvc1.col1, pvc1.col2, pvc2.col1,pvc2.col2) > > AND unix_timestamp(pvc2.time, 'yyyy-MM-dd HH:mm:ss,SSS') > > unix_timestamp(pvc1.time, 'yyyy-MM-dd HH:mm:ss,SSS'); > > OK > 331 > Time taken: 3718.23 seconds > The above UDF query takes more time to run. > > Where testCompare is an udf function, The function just does the pvc1.col1 > = pvc2.col1 OR pvc1.col1 = pvc2.col2 > > Please let me know what is the issue here? > > > Thanks and Regards, > Sankar S. > > > >
