[ https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marco Gaido resolved SPARK-24928. --------------------------------- Resolution: Duplicate > spark sql cross join running time too long > ------------------------------------------ > > Key: SPARK-24928 > URL: https://issues.apache.org/jira/browse/SPARK-24928 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 1.6.2 > Reporter: LIFULONG > Priority: Minor > > spark sql running time is too long while input left table and right table is > small hdfs text format data, > the sql is: select * from t1 cross join t2 > the line of t1 is 499999, three column > the line of t2 is 1, one column only > running more than 30mins and then failed > > > spark CartesianRDD also has the same problem, example test code is: > val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b") //1 line > 1 column > val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b") //499999 > line 3 column > val cartesian = new CartesianRDD(sc, twos, ones) > cartesian.count() > running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use > less than 10 seconds -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org