This is a bug, I had create an issue to track this: https://issues.apache.org/jira/browse/SPARK-3500
Also, there is PR to fix this: https://github.com/apache/spark/pull/2369 Before next bugfix release, you can workaround this by: srdd = sqlCtx.jsonRDD(rdd) srdd2 = SchemaRDD(srdd._schema_rdd.coalesce(N, false, None), sqlCtx) On Thu, Sep 11, 2014 at 6:12 PM, Brad Miller <bmill...@eecs.berkeley.edu> wrote: > Hi All, > > I'm having some trouble with the coalesce and repartition functions for > SchemaRDD objects in pyspark. When I run: > > sqlCtx.jsonRDD(sc.parallelize(['{"foo":"bar"}', > '{"foo":"baz"}'])).coalesce(1) > > I get this error: > > Py4JError: An error occurred while calling o94.coalesce. Trace: > py4j.Py4JException: Method coalesce([class java.lang.Integer, class > java.lang.Boolean]) does not exist > > For context, I have a dataset stored in a parquet file, and I'm using > SQLContext to make several queries against the data. I then register the > results of these as queries new tables in the SQLContext. Unfortunately > each new table has the same number of partitions as the original (despite > being much smaller). Hence my interest in coalesce and repartition. > > Has anybody else encountered this bug? Is there an alternate workflow I > should consider? > > I am running the 1.1.0 binaries released today. > > best, > -Brad --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org