lichenglin created SPARK-13999: ---------------------------------- Summary: Run 'group by' before building cube Key: SPARK-13999 URL: https://issues.apache.org/jira/browse/SPARK-13999 Project: Spark Issue Type: Improvement Reporter: lichenglin
When I'm trying to build a cube on a data set witch has about 1 billion count. The cube has 7 dimensions. It takes a whole day to finish the job with 16 cores; Then I run the 'select count (1) from table group by A,B,C,D,E,F,G' first and run the cube with the 'group by' result data set. The dimensions is the same as 'group by' and do sum on 'count'. It just need 45 minutes. the group by will reduce the data set's count from billions to millions. This depends on the number of dimension. We can try in the new version. The process of averaging may be complex.Should get the sum and count during the group by . -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org