HyukjinKwon commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516423241



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
     }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
       The fix itself is logically making sense. But my concern is that we set 
this deterministic flag true by default for Python UDFs (and Scala UDFs too). 
When users just use UDFs actually without knowing it (which I believe arguably 
pretty common), they will get the same answer from UDFs users intended to work 
as non-deterministically after this fix.
   
   @gatorsmile and @cloud-fan, actually, shouldn't we set it as `false` by 
default? - I am reading 
https://github.com/apache/spark/commit/ebc24a9b7fde273ee4912f9bc1c5059703f7b31e.
 It's an arbitrary user-defined function so I think it makes sense to have the 
most loose condition.

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
     }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
       The fix itself is logically making sense. But my concern is that we set 
this deterministic flag true by default for Python UDFs (and Scala UDFs too). 
When users just use UDFs actually without knowing it (which I believe is 
arguably pretty common), they will get the same answer from UDFs users intended 
to work as non-deterministically after this fix.
   
   @gatorsmile and @cloud-fan, actually, shouldn't we set it as `false` by 
default? - I am reading 
https://github.com/apache/spark/commit/ebc24a9b7fde273ee4912f9bc1c5059703f7b31e.
 It's an arbitrary user-defined function so I think it makes sense to have the 
most loose condition by default.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to