Mark Andreev created SPARK-49044:
------------------------------------

             Summary: Improve error message in ValidateExternalType
                 Key: SPARK-49044
                 URL: https://issues.apache.org/jira/browse/SPARK-49044
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Mark Andreev


When we have mixed schema rows error message "\{actual} is not a valid external 
type for schema of \{expected}" that don't help to understand column with 
problem. I suggest to add information about source column.
h2. How to reproduce
{code:java}
class ErrorMsgSuite extends AnyFunSuite with SharedSparkContext {
  test("shouldThrowSchemaError") {
    val seq: Seq[Row] = Seq(
      Row(
        toBytes("0"),
        toBytes(""),
        1L,
      ),
      Row(
        toBytes("0"),
        toBytes(""),
        1L,
      ),
    )    val schema: StructType = new StructType()
      .add("f1", BinaryType)
      .add("f3", StringType)
      .add("f2", LongType)    val df = 
sqlContext.createDataFrame(sqlContext.sparkContext.parallelize(seq), schema)    
val exception = intercept[RuntimeException] {
      df.show()
    }    assert(
      exception.getCause.getMessage
        .contains("[B is not a valid external type for schema of string")
    )
    assertResult(
      "[B is not a valid external type for schema of string"
    )(exception.getCause.getMessage)
  }  def toBytes(x: String): Array[Byte] = x.toCharArray.map(_.toByte)
} {code}
After fix error message may contain extra info
{code:java}
[B is not a valid external type for schema of string at 
getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, 
f3) {code}
Code: 
[https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to