Hello guys, I'm using Spark 2.2.0 and from time to time my job fails printing into the log the following errors
scala.MatchError: profiles.total^@^@f2-a733-9304fda722ac^@^@^@^@profiles.10361.10005^@^@^@^@.total^@^@0075^@^@^@^@ scala.MatchError: pr^?files.10056.10040 (of class java.lang.String) scala.MatchError: pr^?files.10056.10040 (of class java.lang.String) scala.MatchError: pr^?files.10056.10040 (of class java.lang.String) scala.MatchError: pr^?files.10056.10040 (of class java.lang.String) The job itself looks like the following and contains a few shuffles and UDAFs val df = spark.read.avro(...).as[...] .groupBy(...) .agg(collect_list(...).as(...)) .select(explode(...).as(...)) .groupBy(...) .agg(sum(...).as(...)) .groupBy(...) .agg(collectMetrics(...).as(...)) The errors occur in the collectMetrics UDAF in the following snippet key match { case "profiles.total" => updateMetrics(...) case "profiles.biz" => updateMetrics(...) case ProfileAttrsRegex(...) => updateMetrics(...) } ... and I'm absolutely ok with scala.MatchError because there is no "catch all" case in the pattern matching expression, but the strings containing corrupted characters seem to be very strange. I've found the following jira issues, but it's hardly difficult to say whether they are related to my case: - https://issues.apache.org/jira/browse/SPARK-22092 - https://issues.apache.org/jira/browse/SPARK-23512 So I'm wondering, has anybody ever seen such kind of behaviour and what could be the problem? --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org