Thanks for reporting: https://issues.apache.org/jira/browse/SPARK-11032
You can probably workaround this by aliasing the count and just doing a filter on that value afterwards. On Thu, Oct 8, 2015 at 8:47 PM, Jeff Thompson < jeffreykeatingthomp...@gmail.com> wrote: > After upgrading from 1.4.1 to 1.5.1 I found some of my spark SQL queries > no longer worked. Seems to be related to using count(1) or count(*) in a > nested query. I can reproduce the issue in a pyspark shell with the sample > code below. The ‘people’ table is from spark-1.5.1-bin-hadoop2.4/ > examples/src/main/resources/people.json. > > Environment details: Hadoop 2.5.0-cdh5.3.0, YARN > > *Test code:* > > from pyspark.sql import SQLContext > print(sc.version) > sqlContext = SQLContext(sc) > > df = sqlContext.read.json("/user/thj1pal/people.json") > df.show() > > sqlContext.registerDataFrameAsTable(df,"PEOPLE") > > result = sqlContext.sql("SELECT MIN(t0.age) FROM (SELECT * FROM PEOPLE > WHERE age > 0) t0 HAVING(COUNT(1) > 0)") > result.show() > > *spark 1.4.1 output* > > 1.4.1 > +----+-------+ > | age| name| > +----+-------+ > |null|Michael| > | 30| Andy| > | 19| Justin| > +----+-------+ > > +--+ > |c0| > +--+ > |19| > +--+ > > > *spark 1.5.1 output* > > 1.5.1 > +----+-------+ > | age| name| > +----+-------+ > |null|Michael| > | 30| Andy| > | 19| Justin| > +----+-------+ > > --------------------------------------------------------------------------- > Py4JJavaError Traceback (most recent call last) > <ipython-input-1-342b585498f7> in <module>() > 9 > 10 result = sqlContext.sql("SELECT MIN(t0.age) FROM (SELECT * > FROM PEOPLE WHERE age > 0) t0 HAVING(COUNT(1) > 0)") > ---> 11 result.show() > > /home/thj1pal/spark-1.5.1-bin-hadoop2.4/python/pyspark/sql/dataframe.pyc > in show(self, n, truncate) > 254 +---+-----+ > 255 """ > --> 256 print(self._jdf.showString(n, truncate)) > 257 > 258 def __repr__(self): > > /home/thj1pal/spark-1.5.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py > in __call__(self, *args) > 536 answer = self.gateway_client.send_command(command) > 537 return_value = get_return_value(answer, > self.gateway_client, > --> 538 self.target_id, self.name) > 539 > 540 for temp_arg in temp_args: > > /home/thj1pal/spark-1.5.1-bin-hadoop2.4/python/pyspark/sql/utils.pyc in > deco(*a, **kw) > 34 def deco(*a, **kw): > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > 38 s = e.java_exception.toString() > > /home/thj1pal/spark-1.5.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 298 raise Py4JJavaError( > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > 302 raise Py4JError( > > Py4JJavaError: An error occurred while calling o33.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task > 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 4.0 (TID 9, pal-bd-n06-ib): java.lang.UnsupportedOperationException: Cannot > evaluate expression: count(1) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:188) > at > org.apache.spark.sql.catalyst.expressions.Count.eval(aggregates.scala:156) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:327) > …. >