I find that if the input is a Set then Spark doesn't try to find an encoder for the Set but at the same time if the output of a method is a Set it does try to find an encoder and if not found errors out. My understanding is that even the input set has to be transferred to the executors?
The first method testMethodSet takes an Array[Set[String]] and returns a Set[String] def testMethodSet(info: Row, arrSetString: Array[Set[String]]): Set[String] = { val assignments = info.getAs[Seq[String]](0).toSet for (setString <- arrSetString) { if (setString.subsetOf(assignments)) { return setString } } Set[String]() } Another method takes the same arguments as the first one but returns a Seq[String] def testMethodSeq(info: Row, arrSetString: Array[Set[String]]): Seq[String] = { val assignments = info.getAs[Seq[String]](0).toSet for (setString <- arrSetString) { if (setString.subsetOf(assignments)) { return setString.toSeq } } Seq[String]() } The testMethodSet method throws an error testRows.map(s => testMethodSet(s,"test",Array((Set("aaaa","bbbb"))))) <console>:20: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. testRows.map(s => testMethodSet(s,"test",Array((Set("aaaa","bbbb"))))) The testMethodSeq method works fine testRows.map(s => testMethodSeq(s, Array((Set("aaaa","bbbb"))))) res12: org.apache.spark.sql.Dataset[Seq[String]] = [value: array<string>] -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org