[ https://issues.apache.org/jira/browse/SPARK-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-8712. ---------------------------- Resolution: Not A Problem > Hive's Parser does not support distinct aggregations with OVER clause > --------------------------------------------------------------------- > > Key: SPARK-8712 > URL: https://issues.apache.org/jira/browse/SPARK-8712 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 1.4.0 > Reporter: Yin Huai > > Hive's parser ignores Window spec when a distinct aggregation is used. > {code} > scala> Seq((1, 2, 3)).toDF("i", "j", "k").registerTempTable("t") > scala> sql("select count(distinct j) over (partition by i) from > t").explain(true) > == Parsed Logical Plan == > 'Project [UnresolvedAlias > CountDistinct > UnresolvedAttribute [j] > ] > 'UnresolvedRelation [t], None > == Analyzed Logical Plan == > _c0: bigint > Aggregate [COUNT(DISTINCT j#23) AS _c0#27L] > Subquery t > Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24] > LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]] > == Optimized Logical Plan == > Aggregate [COUNT(DISTINCT j#23) AS _c0#27L] > LocalRelation [j#23], [[2]] > == Physical Plan == > GeneratedAggregate false, [CombineAndCount(partialSets#28) AS _c0#27L], false > Exchange SinglePartition > GeneratedAggregate true, [AddToHashSet(j#23) AS partialSets#28], false > LocalTableScan [j#23], [[2]] > Code Generation: true > == RDD == > scala> sql("select count(j) over (partition by i) from t").explain(true) > == Parsed Logical Plan == > 'Project [UnresolvedAlias > WindowExpression > UnresolvedWindowFunction count > UnresolvedAttribute [j] > WindowSpecDefinition UnspecifiedFrame > UnresolvedAttribute [i] > ] > 'UnresolvedRelation [t], None > == Analyzed Logical Plan == > _c0: bigint > Project [_c0#31L] > Project [j#23,i#22,_c0#31L,_c0#31L] > Window [j#23,i#22], > [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) > WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED > FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING > AND UNBOUNDED FOLLOWING > Project [j#23,i#22] > Subquery t > Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24] > LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]] > == Optimized Logical Plan == > Project [_c0#31L] > Window [j#23,i#22], > [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) > WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED > FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING > AND UNBOUNDED FOLLOWING > LocalRelation [j#23,i#22], [[2,1]] > == Physical Plan == > Project [_c0#31L] > Window [j#23,i#22], > [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) > WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED > FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING > AND UNBOUNDED FOLLOWING > ExternalSort [i#22 ASC], false > Exchange (HashPartitioning 200) > LocalTableScan [j#23,i#22], [[2,1]] > Code Generation: true > == RDD == > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org