[ https://issues.apache.org/jira/browse/SPARK-29694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964571#comment-16964571 ]
Xuedong Luan commented on SPARK-29694: -------------------------------------- hi [~yumwang] I will work on this Jira > Execute UDF only once when there are multiple identical UDF usages > ------------------------------------------------------------------ > > Key: SPARK-29694 > URL: https://issues.apache.org/jira/browse/SPARK-29694 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Yuming Wang > Priority: Major > > Example: > {code:sql} > SELECT > CASE > WHEN udf1(col1, 'swd') = '2' THEN 'Facebook' > WHEN udf1(col1, 'swd') = '3' THEN 'Twitter' > WHEN udf1(col1, 'swd') = '11' THEN 'Pinterest' > WHEN col2 IN (28,29) THEN > WHEN col2 IN (10,16,18) AND udf1(col1, 'rd') IS NULL AND > udf1(col1, 'rot') IN ('71188223167180', '14361105000167180') THEN 'Yandex' > WHEN col2 IN (10,16,18) AND udf1(col1, 'rd') IS NULL AND > udf1(col1, 'rot') IN ('4686145537108740', '7055082982390', '7113399718530') > THEN 'Yahoo' > WHEN col2 IN (10,16,18) AND udf1(col1, 'rd') IS NULL THEN > 'Google' > WHEN udf1(col1, 'rd') LIKE '%google%' OR udf1(col1, 'rd') LIKE > '%gmail%' THEN 'Google' > WHEN udf1(col1, 'rd') LIKE '%yahoo%' THEN 'Yahoo' > WHEN udf1(col1, 'rd') LIKE '%bing%' THEN 'Bing' > WHEN udf1(col1, 'rd') LIKE '%facebook%' THEN 'Facebook' > WHEN udf1(col1, 'rd') LIKE '%pinterest%' THEN 'Pinterest' > WHEN udf1(col1, 'rd') LIKE '%twitter%' OR udf1(col1, 'rd') LIKE > '%t.co' THEN 'Twitter' > WHEN udf1(col1, 'rd') LIKE '%baidu%' THEN 'Baidu' > WHEN udf1(col1, 'rd') LIKE '%yandex%' THEN 'Yandex' > WHEN udf1(col1, 'rd') LIKE '%aol.%' THEN 'AOL' > WHEN udf1(col1, 'rd') LIKE '%ask.%' THEN 'Ask' > WHEN udf1(col1, 'rd') LIKE '%duckduckgo.%' THEN 'DuckDuckGo' > WHEN udf1(col1, 'rd') LIKE '%t-online.de' THEN 'T-Online' > WHEN udf1(col1, 'rd') LIKE '%com-kleinanzeigen.%' OR udf1(col1, > 'rd') LIKE '%kleinanzeigen%' THEN 'Kleinanzeigen' > WHEN udf1(col1, 'rd') LIKE '%com.%' OR udf1(col1, 'rd') LIKE > '%comdesc.%' THEN 'com' > WHEN udf1(col1, 'rd') LIKE '%paypal.%' THEN 'PayPal' > WHEN udf1(col1, 'rd') IS NULL THEN 'None' > ELSE 'Other' END AS source_domain, > COUNT(*) AS cnt > FROM > tbl s > GROUP BY > 1 > {code} > We can rewrite it to: > {code:sql} > SELECT > CASE > WHEN udf1(col1, 'swd') = '2' THEN 'Facebook' > WHEN udf1(col1, 'swd') = '3' THEN 'Twitter' > WHEN udf1(col1, 'swd') = '11' THEN 'Pinterest' > WHEN col2 IN (28,29) THEN 'Google' > WHEN col2 IN (10,16,18) AND col1 IS NULL AND udf1(col1, 'rot') > IN ('71188223167180', '14361105000167180') THEN 'Yandex' > WHEN col2 IN (10,16,18) AND col1 IS NULL AND udf1(col1, 'rot') > IN ('4686145537108740', '7055082982390', '7113399718530') THEN 'Yahoo' > WHEN col2 IN (10,16,18) AND col1 IS NULL THEN 'Google' > WHEN col1 LIKE '%google%' OR col1 LIKE '%gmail%' THEN 'Google' > WHEN col1 LIKE '%yahoo%' THEN 'Yahoo' > WHEN col1 LIKE '%bing%' THEN 'Bing' > WHEN col1 LIKE '%facebook%' THEN 'Facebook' > WHEN col1 LIKE '%pinterest%' THEN 'Pinterest' > WHEN col1 LIKE '%twitter%' OR col1 LIKE '%t.co' THEN 'Twitter' > WHEN col1 LIKE '%baidu%' THEN 'Baidu' > WHEN col1 LIKE '%yandex%' THEN 'Yandex' > WHEN col1 LIKE '%aol.%' THEN 'AOL' > WHEN col1 LIKE '%ask.%' THEN 'Ask' > WHEN col1 LIKE '%duckduckgo.%' THEN 'DuckDuckGo' > WHEN col1 LIKE '%t-online.de' THEN 'T-Online' > WHEN col1 LIKE '%com-kleinanzeigen.%' OR col1 LIKE > '%kleinanzeigen%' THEN 'Kleinanzeigen' > WHEN col1 LIKE '%com.%' OR col1 LIKE '%comdesc.%' THEN 'com' > WHEN col1 LIKE '%paypal.%' THEN 'PayPal' > WHEN col1 IS NULL THEN 'None' > ELSE 'Other' END AS source_domain, > COUNT(*) AS cnt > FROM > (SELECT *, udf1(col1, 'rd') as col1 FROM tbl) s > GROUP BY > 1 > {code} > It would be great if we could optimize it by the framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org