[ https://issues.apache.org/jira/browse/SPARK-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell resolved SPARK-7393. ------------------------------------ Resolution: Invalid Hi - thanks for giving feedback on your use of Spark SQL. This type of discussions should take place on the mailing list rather than our feature issue tracker. > How to improve Spark SQL performance? > ------------------------------------- > > Key: SPARK-7393 > URL: https://issues.apache.org/jira/browse/SPARK-7393 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Liang Lee > > We want to use Spark SQL in our project ,but we found that the Spark SQL > performance is not very well as we expected. The detail is as follows: > 1. We save data as parquet file on HDFS. > 2.We just select one or several rows from the parquet file using spark SQL. > 3. When the total record number is 61 million, it needs about 3 seconds to > get the result, which is unacceptable long for our scenario. > 4.When the total record number is 2 million, it needs about 93 ms to get the > result, whcih is still a little long for us. > 5. The query statement is like : SELECT * FROM DBA WHERE COLA=? AND COLB=? > And the table is not complex, which has less 10 columns and the content for > each column is less than 100 bytes. > 6. Does any one know how to improve the performance or give some other ideas? > 7. Can Spark SQL support micro-second-level response? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org