Mark Andreev created SPARK-49044: ------------------------------------ Summary: Improve error message in ValidateExternalType Key: SPARK-49044 URL: https://issues.apache.org/jira/browse/SPARK-49044 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mark Andreev
When we have mixed schema rows error message "\{actual} is not a valid external type for schema of \{expected}" that don't help to understand column with problem. I suggest to add information about source column. h2. How to reproduce {code:java} class ErrorMsgSuite extends AnyFunSuite with SharedSparkContext { test("shouldThrowSchemaError") { val seq: Seq[Row] = Seq( Row( toBytes("0"), toBytes(""), 1L, ), Row( toBytes("0"), toBytes(""), 1L, ), ) val schema: StructType = new StructType() .add("f1", BinaryType) .add("f3", StringType) .add("f2", LongType) val df = sqlContext.createDataFrame(sqlContext.sparkContext.parallelize(seq), schema) val exception = intercept[RuntimeException] { df.show() } assert( exception.getCause.getMessage .contains("[B is not a valid external type for schema of string") ) assertResult( "[B is not a valid external type for schema of string" )(exception.getCause.getMessage) } def toBytes(x: String): Array[Byte] = x.toCharArray.map(_.toByte) } {code} After fix error message may contain extra info {code:java} [B is not a valid external type for schema of string at getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, f3) {code} Code: [https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org