[ https://issues.apache.org/jira/browse/SPARK-24202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006214#comment-17006214 ]
Sam hendley commented on SPARK-24202: ------------------------------------- I agree that this would be a very valuable change, was there a reason this was closed without comment? > Separate SQLContext dependency from SparkSession.implicits > ---------------------------------------------------------- > > Key: SPARK-24202 > URL: https://issues.apache.org/jira/browse/SPARK-24202 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Gerard Maas > Priority: Major > Labels: bulk-closed > > The current implementation of the implicits in SparkSession passes the > current active SQLContext to the SQLImplicits class. This implies that all > usage of these (extremely helpful) implicits require the prior creation of a > Spark Session instance. > Usage is typically done as follows: > > {code:java} > val sparkSession = SparkSession.builder() > ....getOrCreate() > import sparkSession.implicits._ > {code} > > This is OK in user code, but it burdens the creation of library code that > uses Spark, where static imports for _Encoder_ support is required. > A simple example would be: > > {code:java} > class SparkTransformation[In: Encoder, Out: Encoder] { > def transform(ds: Dataset[In]): Dataset[Out] > } > {code} > > Attempting to compile such code would result in the following exception: > {code:java} > Unable to find encoder for type stored in a Dataset. Primitive types (Int, > String, etc) and Product types (case classes) are supported by importing > spark.implicits._ Support for serializing other types will be added in > future releases.{code} > The usage of the _SQLContext_ instance in _SQLImplicits_ is limited to two > utilities to transform _RDD_ and local collections into a _Dataset_. > These are 2 methods of the 46 implicit conversions offered by this class. > The request is to separate the two implicit methods that depend on the > SQLContext instance creation into a separate class: > {code:java} > SQLImplicits#214-229 > /** > * Creates a [[Dataset]] from an RDD. > * > * @since 1.6.0 > */ > implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] = > { > DatasetHolder(_sqlContext.createDataset(rdd)) > } > /** > * Creates a [[Dataset]] from a local Seq. > * @since 1.6.0 > */ > implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): > DatasetHolder[T] = { > DatasetHolder(_sqlContext.createDataset(s)) > }{code} > By separating the static methods from these two methods that depend on > _sqlContext_ into separate classes, we could provide static imports for all > the other functionality and only require the instance-bound implicits for > the RDD and collection support (Which is an uncommon use case these days) > As this is potentially breaking the current interface, this might be a > candidate for Spark 3.0. Although there's nothing stopping us from creating a > separate hierarchy for the static encoders already. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org