RE: Problem with CSV line break data in PySpark 2.1.0

JG Perrin Tue, 05 Sep 2017 12:06:56 -0700

Have you tried the built-in parser, not the databricks one (which is not really 
used anymore)?
What is your original CSV looking like?
What is your code looking like? There are quite a few options to read a CSV…

From: Aakash Basu [mailto:aakash.spark....@gmail.com]
Sent: Sunday, September 03, 2017 5:16 AM
To: user <user@spark.apache.org>
Subject: Problem with CSV line break data in PySpark 2.1.0

Hi,

I've a dataset where a few rows of the column F as shown below have line breaks 
in CSV file.

[Inline image 1]

When Spark is reading it, it is coming as below, which is a complete new line.

[Inline image 2]

I want my PySpark 2.1.0 to read it by forcefully avoiding the line break after 
the date, which is not happening as I am using com.databricks.csv reader. And 
nulls are getting created after the date for line 2 for the rest of the columns 
from G till end.

Can I please be helped how to handle this?

Thanks,
Aakash.

______________________________________________________________________
This electronic transmission and any documents accompanying this electronic 
transmission contain confidential information belonging to the sender.  This 
information may contain confidential health information that is legally 
privileged.  The information is intended only for the use of the individual or 
entity named above.  The authorized recipient of this transmission is 
prohibited from disclosing this information to any other party unless required 
to do so by law or regulation and is required to delete or destroy the 
information after its stated need has been fulfilled.  If you are not the 
intended recipient, you are hereby notified that any disclosure, copying, 
distribution or the taking of any action in reliance on or regarding the 
contents of this electronically transmitted information is strictly prohibited. 
 If you have received this E-mail in error, please notify the sender and delete 
this message immediately.

RE: Problem with CSV line break data in PySpark 2.1.0

Reply via email to