[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-02 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150566#comment-17150566 ] Apache Spark commented on SPARK-32130: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-02 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150565#comment-17150565 ] Apache Spark commented on SPARK-32130: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-01 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149348#comment-17149348 ] Apache Spark commented on SPARK-32130: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-01 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149347#comment-17149347 ] Apache Spark commented on SPARK-32130: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-01 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149334#comment-17149334 ] Hyukjin Kwon commented on SPARK-32130: -- Yeah, we can disable it back by default considering the

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-01 Thread Bart Samwel (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149263#comment-17149263 ] Bart Samwel commented on SPARK-32130: - +1 to what [~cloud_fan] said. We should just keep the default

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-01 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149246#comment-17149246 ] Wenchen Fan commented on SPARK-32130: - I'm not sure about 1. It's good to have but not necessary to

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-01 Thread Maxim Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149235#comment-17149235 ] Maxim Gekk commented on SPARK-32130: I would like to propose: # Add the SQL config 

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-07-01 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149225#comment-17149225 ] Wenchen Fan commented on SPARK-32130: - Even in Spark 2.4, the type inference takes much more time

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149019#comment-17149019 ] Jungtaek Lim commented on SPARK-32130: -- There might be some tricks to make type inference for

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread Sean R. Owen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148936#comment-17148936 ] Sean R. Owen commented on SPARK-32130: -- So, is the issue that it's trying and failing to parse

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread Sanjeev Mishra (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148663#comment-17148663 ] Sanjeev Mishra commented on SPARK-32130: I tried to load entire dataset using above suggestions

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148570#comment-17148570 ] Jungtaek Lim commented on SPARK-32130: -- Looks like we already saw the difference but we missed to

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148559#comment-17148559 ] Jungtaek Lim commented on SPARK-32130: -- So anyone can just reproduce via running spark-shell on

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148546#comment-17148546 ] Jungtaek Lim commented on SPARK-32130: -- For me it's reproduced consistently. Please make sure you

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread JinxinTang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148501#comment-17148501 ] JinxinTang commented on SPARK-32130: [~gourav.sengupta] Nice notebook, is seems the row count() is 

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread Gourav (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148459#comment-17148459 ] Gourav commented on SPARK-32130: [~JinxinTang] and [~lotus2you] I think that it is only the first time

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148416#comment-17148416 ] Jungtaek Lim commented on SPARK-32130: -- I've feeling that "opt-in" approach is correct as it brings

[jira] [Commented] (SPARK-32130) Spark 3.0 json load performance is unacceptable in comparison of Spark 2.4

2020-06-30 Thread JinxinTang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148393#comment-17148393 ] JinxinTang commented on SPARK-32130: [~lotus2you] Thank you for your feedback. Lots of time is used