Murphy's Law striking after asking the question, I just discovered the solution: The jdbc url should set the zeroDateTimeBehavior option. https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html https://stackoverflow.com/questions/11133759/0000-00-00-000000-can-not-be-represented-as-java-sql-timestamp-error
On Wed, Jun 5, 2019 at 6:29 PM Anthony May <anthony...@gmail.com> wrote: > Hi, > > We have a legacy process of scraping a MySQL Database. The Spark job uses > the DataFrame API and MySQL JDBC driver to read the tables and save them as > JSON files. One table has DateTime columns that contain values invalid for > java.sql.Timestamp so it's throwing the exception: > java.sql.SQLException: Value '0000-00-00 00:00:00' can not be represented > as java.sql.Timestamp > > Unfortunately, I can't edit the values in the table to make them valid. > There doesn't seem to be a way to specify row level exception handling in > the DataFrame API. Is there a way to handle this that would scale for > hundreds of tables? > > Any help is appreciated. > > Anthony >