I find that if the input is a Set then Spark doesn't try to find an encoder
for the Set but at the same time if the output of a method is a Set it does
try to find an encoder and if not found errors out. My understanding is that
even the input set has to be transferred to the executors?

The first method testMethodSet takes an Array[Set[String]] and returns a
Set[String]
def testMethodSet(info: Row, arrSetString: Array[Set[String]]): Set[String]
= {
  val assignments = info.getAs[Seq[String]](0).toSet
  for (setString <- arrSetString) {
    if (setString.subsetOf(assignments)) {
      return setString
    }
  }
  Set[String]()
}

Another method takes the same arguments as the first one but returns a
Seq[String]
 def testMethodSeq(info: Row,  arrSetString: Array[Set[String]]):
Seq[String] = {
  val assignments = info.getAs[Seq[String]](0).toSet

  for (setString <- arrSetString) {
    if (setString.subsetOf(assignments)) {
      return setString.toSeq
    }
  }

  Seq[String]()
}

The testMethodSet method throws an error
testRows.map(s => testMethodSet(s,"test",Array((Set("aaaa","bbbb")))))
<console>:20: error: Unable to find encoder for type stored in a Dataset. 
Primitive types (Int, String, etc) and Product types (case classes) are
supported by importing spark.implicits._  Support for serializing other
types will be added in future releases.
       testRows.map(s =>
testMethodSet(s,"test",Array((Set("aaaa","bbbb")))))

The testMethodSeq method works fine

testRows.map(s => testMethodSeq(s, Array((Set("aaaa","bbbb")))))
res12: org.apache.spark.sql.Dataset[Seq[String]] = [value: array<string>]

   



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to