Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
Ohh great! Thanks for the clarification. On Wed, Oct 28, 2015 at 4:21 PM, Reynold Xin wrote: > No those are just functions for the DataFrame programming API. > > On Wed, Oct 28, 2015 at 11:49 AM, Shagun Sodhani > wrote: > >> @Reynold I seem to be missing something. Aren't the functions listed h

Re: Exception when using some aggregate operators

2015-10-28 Thread Reynold Xin
No those are just functions for the DataFrame programming API. On Wed, Oct 28, 2015 at 11:49 AM, Shagun Sodhani wrote: > @Reynold I seem to be missing something. Aren't the functions listed here > > to >

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
@Reynold I seem to be missing something. Aren't the functions listed here to be treated as sql operators as well? I do see that these are mentioned as Functions available for DataFrame

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
Created SPARK-11371 with a patch. Will create PR soon. On Wed, Oct 28, 2015 at 3:42 AM, Reynold Xin wrote: > I don't think these are bugs. The SQL standard for average is "avg", not > "mean". Similarly, a distinct count is supposed to be written as > "count(distinct col)", not "countDistinct(co

Re: Exception when using some aggregate operators

2015-10-28 Thread Reynold Xin
I don't think these are bugs. The SQL standard for average is "avg", not "mean". Similarly, a distinct count is supposed to be written as "count(distinct col)", not "countDistinct(col)". We can, however, make "mean" an alias for "avg" to improve compatibility between DataFrame and SQL. On Wed, O

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
Wouldnt it be: +expression[Max]("avg"), On Wed, Oct 28, 2015 at 4:06 PM, Ted Yu wrote: > Since there is already Average, the simplest change is the following: > > $ git diff > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala > diff --git > a/sql/cata

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
Also are the other aggregate functions to be treated as bugs or not? On Wed, Oct 28, 2015 at 4:08 PM, Shagun Sodhani wrote: > Wouldnt it be: > > +expression[Max]("avg"), > > On Wed, Oct 28, 2015 at 4:06 PM, Ted Yu wrote: > >> Since there is already Average, the simplest change is the follow

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
Since there is already Average, the simplest change is the following: $ git diff sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
I tried adding the aggregate functions in the registry and they work, other than mean, for which Ted has forwarded some code changes. I will try out those changes and update the status here. On Wed, Oct 28, 2015 at 9:03 AM, Shagun Sodhani wrote: > Yup avg works good. So we have alternate functio

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani
Yup avg works good. So we have alternate functions to use in place on the functions pointed out earlier. But my point is that are those original aggregate functions not supposed to be used or I am using them in the wrong way or is it a bug as I asked in my first mail. On Wed, Oct 28, 2015 at 3:20

Re: Exception when using some aggregate operators

2015-10-27 Thread Ted Yu
Have you tried using avg in place of mean ? (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j, s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } sqlContext.sql(""" CREATE TEMPORARY TABLE partitionedParquet USING org.apache.spark.sql.parquet OPTIONS ( path '/tm

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani
So I tried @Reynold's suggestion. I could get countDistinct and sumDistinct running but mean and approxCountDistinct do not work. (I guess I am using the wrong syntax for approxCountDistinct) For mean, I think the registry entry is missing. Can someone clarify that as well? On Tue, Oct 27, 2015 a

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani
Will try in a while when I get back. I assume this applies to all functions other than mean. Also countDistinct is defined along with all other SQL functions. So I don't get "distinct is not part of function name" part. On 27 Oct 2015 19:58, "Reynold Xin" wrote: > Try > > count(distinct columnane

Re: Exception when using some aggregate operators

2015-10-27 Thread Reynold Xin
Try count(distinct columnane) In SQL distinct is not part of the function name. On Tuesday, October 27, 2015, Shagun Sodhani wrote: > Oops seems I made a mistake. The error message is : Exception in thread > "main" org.apache.spark.sql.AnalysisException: undefined function > countDistinct > On

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani
Oops seems I made a mistake. The error message is : Exception in thread "main" org.apache.spark.sql.AnalysisException: undefined function countDistinct On 27 Oct 2015 15:49, "Shagun Sodhani" wrote: > Hi! I was trying out some aggregate functions in SparkSql and I noticed > that certain aggregate