[ https://issues.apache.org/jira/browse/SPARK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew updated SPARK-20035: --------------------------- Description: When there is no record in a dataset, the call to write with the spark-csv creates empty file (i.e. with no title line) ``` dataset.write().format("com.databricks.spark.csv").option("header", "true").save("... file name here ..."); or dataset.write().option("header", "true").csv("... file name here ..."); ``` The same file then cannot be read by using the same format (i.e. spark-csv) since it is empty as below. The same call works if the dataset has at least one record. ``` sqlCtx.read().format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("... file name here ..."); or sparkSession.read().option("header", "true").option("inferSchema", "true").csv("... file name here ..."); ``` This is not right, you should always be able to read the file that you wrote to. was: When there is no record in a dataset, the call to write with the spark-csv creates empty file (i.e. with no title line) ``` dataset.write().format("com.databricks.spark.csv").option("header", "true").save("... file name here ..."); ``` The same file then cannot be read by using the same format (i.e. spark-csv) since it is empty as below. The same call works if the dataset has at least one record. ``` sqlCtx.read().format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("... file name here ..."); ``` This is not right, you should always be able to read the file that you wrote to. Summary: Spark 2.0.2 writes empty file if no record is in the dataset (was: spark-csv writes empty file if no record is in the dataset) > Spark 2.0.2 writes empty file if no record is in the dataset > ------------------------------------------------------------ > > Key: SPARK-20035 > URL: https://issues.apache.org/jira/browse/SPARK-20035 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 2.0.2 > Environment: Spark 2.0.2 > Linux/Windows > Reporter: Andrew > > When there is no record in a dataset, the call to write with the spark-csv > creates empty file (i.e. with no title line) > ``` > dataset.write().format("com.databricks.spark.csv").option("header", > "true").save("... file name here ..."); > or > dataset.write().option("header", "true").csv("... file name here ..."); > ``` > The same file then cannot be read by using the same format (i.e. spark-csv) > since it is empty as below. The same call works if the dataset has at least > one record. > ``` > sqlCtx.read().format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load("... file name here ..."); > or > sparkSession.read().option("header", "true").option("inferSchema", > "true").csv("... file name here ..."); > ``` > This is not right, you should always be able to read the file that you wrote > to. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org