[jira] [Updated] (SPARK-20035) Spark 2.0.2 writes empty file if no record is in the dataset

Andrew (JIRA) Mon, 20 Mar 2017 10:59:13 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew updated SPARK-20035:
---------------------------
    Description: 
When there is no record in a dataset, the call to write with the spark-csv 
creates empty file (i.e. with no title line)

```
dataset.write().format("com.databricks.spark.csv").option("header", 
"true").save("... file name here ...");

or 

dataset.write().option("header", "true").csv("... file name here ...");
```

The same file then cannot be read by using the same format (i.e. spark-csv) 
since it is empty as below. The same call works if the dataset has at least one 
record.

```
sqlCtx.read().format("com.databricks.spark.csv").option("header", 
"true").option("inferSchema", "true").load("... file name here ...");

or 

sparkSession.read().option("header", "true").option("inferSchema", 
"true").csv("... file name here ...");
```

This is not right, you should always be able to read the file that you wrote to.

  was:
When there is no record in a dataset, the call to write with the spark-csv 
creates empty file (i.e. with no title line)

```
dataset.write().format("com.databricks.spark.csv").option("header", 
"true").save("... file name here ...");
```

The same file then cannot be read by using the same format (i.e. spark-csv) 
since it is empty as below. The same call works if the dataset has at least one 
record.

```
sqlCtx.read().format("com.databricks.spark.csv").option("header", 
"true").option("inferSchema", "true").load("... file name here ...");
```

This is not right, you should always be able to read the file that you wrote to.

        Summary: Spark 2.0.2 writes empty file if no record is in the dataset  
(was: spark-csv writes empty file if no record is in the dataset)

> Spark 2.0.2 writes empty file if no record is in the dataset
> ------------------------------------------------------------
>
>                 Key: SPARK-20035
>                 URL: https://issues.apache.org/jira/browse/SPARK-20035
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.0.2
>         Environment: Spark 2.0.2
> Linux/Windows
>            Reporter: Andrew
>
> When there is no record in a dataset, the call to write with the spark-csv 
> creates empty file (i.e. with no title line)
> ```
> dataset.write().format("com.databricks.spark.csv").option("header", 
> "true").save("... file name here ...");
> or 
> dataset.write().option("header", "true").csv("... file name here ...");
> ```
> The same file then cannot be read by using the same format (i.e. spark-csv) 
> since it is empty as below. The same call works if the dataset has at least 
> one record.
> ```
> sqlCtx.read().format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load("... file name here ...");
> or 
> sparkSession.read().option("header", "true").option("inferSchema", 
> "true").csv("... file name here ...");
> ```
> This is not right, you should always be able to read the file that you wrote 
> to.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20035) Spark 2.0.2 writes empty file if no record is in the dataset

Reply via email to