[ https://issues.apache.org/jira/browse/SPARK-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navya Krishnappa reopened SPARK-20387: -------------------------------------- Source File: 'Col1','Col2','Col3','Col4','Col5','Col6', '1000','abc','10yui000','400','20.8','2003-03-04', '1001','xyz','30000','4000','20.8','2003-03-04', '1002','abc','40000','40,000','20.8','2003-03-04' '1003','xyz','50000','40,0000','20.8','2003-03-04' '1004','abc','60000','40,000','20.8','2003-03-04' User_defined_Schema: [{ "dataType": "integer", "type": "Measure", "name": "Col1" }, { "dataType": "string", "type": "Dimension", "name": "Col2" }, { "dataType": "float", "type": "Measure", "name": "Col3" }, { "dataType": "string", "type": "Dimension", "name": "Col4" }, { "dataType": "double", "type": "Measure", "name": "Col5" }, { "dataType": "date", "type": "Dimension", "name": "Col6" }, { "dataType": "string", "type": "Dimension", "name": "_c6" } Source code1: Dataset dataset =sparkSession.read().schema(User_defined_Schema) .option(PARSER_LIB, "commons") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); dataset.collect(); Result: 10yui000 is parsed as 10 Row : '1000','abc','10','400','20.8','2003-03-04', Expected: According to the PERMISSIVE mode, 10yui000 should be replaced with null. > Permissive mode is not replacing corrupt record with null > --------------------------------------------------------- > > Key: SPARK-20387 > URL: https://issues.apache.org/jira/browse/SPARK-20387 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.1.0 > Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying "mode" as > PERMISSIVE. > Source File: > String,int,f1,bool1 > abc,23111,23.07738,true > abc,23111,23.07738,true > abc,23111,true,true > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.collect(); > Result: Error is thrown > stack trace: > ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 15) > java.lang.IllegalArgumentException: For input string: "23.07738" > at > scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290) > at > scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260) > at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29) > at > org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:270) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org