[jira] [Commented] (SPARK-26532) repartitionByRange reads source files twice

2019-01-06 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735452#comment-16735452 ] Hyukjin Kwon commented on SPARK-26532: -- It has sample option {{spark.sql.execution

[jira] [Commented] (SPARK-26532) repartitionByRange reads source files twice

2019-01-06 Thread Mike Dias (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735451#comment-16735451 ] Mike Dias commented on SPARK-26532: --- But does it needs to read the entire dataset in o

[jira] [Commented] (SPARK-26532) repartitionByRange reads source files twice

2019-01-06 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735443#comment-16735443 ] Hyukjin Kwon commented on SPARK-26532: -- It's by design. It should run a job to esti