Xiangrui Meng created SPARK-3572: ------------------------------------ Summary: Support register UserType in SQL Key: SPARK-3572 URL: https://issues.apache.org/jira/browse/SPARK-3572 Project: Spark Issue Type: New Feature Components: SQL Reporter: Xiangrui Meng
If a user knows how to map a class to a struct type in Spark SQL, he should be able to register this mapping through sqlContext and hence SQL can figure out the schema automatically. {code} trait RowSerializer[T] { def dataType: StructType def serialize(obj: T): Row def deserialize(row: Row): T } sqlContext.registerUserType[T](clazz: classOf[T], serializer: classOf[RowSerializer[T]]) {code} In sqlContext, we can maintain a class-to-serializer map and use it for conversion. The serializer class can be embedded into the metadata, so when `select` is called, we know we want to deserialize the result. {code} sqlContext.registerUserType(classOf[Vector], classOf[VectorRowSerializer]) val points: RDD[LabeledPoint] = ... val features: RDD[Vector] = points.select('features).map { case Row(v: Vector) => v } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org