Davies Liu created SPARK-15392:
----------------------------------

             Summary: The default value of size estimation is not good
                 Key: SPARK-15392
                 URL: https://issues.apache.org/jira/browse/SPARK-15392
             Project: Spark
          Issue Type: Bug
    Affects Versions: 2.0.0
            Reporter: Davies Liu


We use  autoBroadcastJoinThreshold + 1L as the default value of size 
estimation, that is not good in 2.0, because we will calculate the size based 
on size of schema, then the estimation could be less than 
autoBroadcastJoinThreshold if you have an SELECT on top of an DataFrame created 
from RDD.

We should use an even bigger default value, for example, MaxLong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to