Hurshal Patel created SPARK-12348: ------------------------------------- Summary: PySpark _inferSchema crashes with incorrect exception on an empty RDD Key: SPARK-12348 URL: https://issues.apache.org/jira/browse/SPARK-12348 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.5.0 Reporter: Hurshal Patel Priority: Minor
{code:python} >>> rdd = sc.emptyRDD() >>> df = sqlContext.createDataFrame(rdd) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/memsql/spark/python/pyspark/sql/context.py", line 404, in createDataFrame rdd, schema = self._createFromRDD(data, schema, samplingRatio) File "/home/memsql/spark/python/pyspark/sql/context.py", line 285, in _createFromRDD struct = self._inferSchema(rdd, samplingRatio) File "/home/memsql/spark/python/pyspark/sql/context.py", line 229, in _inferSchema first = rdd.first() File "/home/memsql/spark/python/pyspark/rdd.py", line 1320, in first raise ValueError("RDD is empty") ValueError: RDD is empty {code} throws "RDD is empty" in rdd.first() instead of the intended message "The first row in RDD is empty, can not infer schema" in sqlContext._inferSchema -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org