[jira] [Commented] (SPARK-24202) Separate SQLContext dependency from SparkSession.implicits

Sam hendley (Jira) Tue, 31 Dec 2019 10:57:01 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-24202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006214#comment-17006214
 ]


Sam hendley commented on SPARK-24202:
-------------------------------------

I agree that this would be a very valuable change, was there a reason this was 
closed without comment?

> Separate SQLContext dependency from SparkSession.implicits
> ----------------------------------------------------------
>
>                 Key: SPARK-24202
>                 URL: https://issues.apache.org/jira/browse/SPARK-24202
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Gerard Maas
>            Priority: Major
>              Labels: bulk-closed
>
> The current implementation of the implicits in SparkSession passes the 
> current active SQLContext to the SQLImplicits class. This implies that all 
> usage of these (extremely helpful) implicits require the prior creation of a 
> Spark Session instance.
> Usage is typically done as follows:
>  
> {code:java}
> val sparkSession = SparkSession.builder()
> ....getOrCreate()
> import sparkSession.implicits._
> {code}
>  
> This is OK in user code, but it burdens the creation of library code that 
> uses Spark, where  static imports for _Encoder_ support is required.
> A simple example would be:
>  
> {code:java}
> class SparkTransformation[In: Encoder, Out: Encoder] {
>     def transform(ds: Dataset[In]): Dataset[Out]
> }
> {code}
>  
> Attempting to compile such code would result in the following exception:
> {code:java}
> Unable to find encoder for type stored in a Dataset.  Primitive types (Int, 
> String, etc) and Product types (case classes) are supported by importing 
> spark.implicits._  Support for serializing other types will be added in 
> future releases.{code}
> The usage of the _SQLContext_ instance in _SQLImplicits_ is limited to two 
> utilities to transform _RDD_ and local collections into a _Dataset_.
> These are 2 methods of the 46 implicit conversions offered by this class.
> The request is to separate the two implicit methods that depend on the 
> SQLContext instance creation into a separate class:
> {code:java}
> SQLImplicits#214-229
> /**
>  * Creates a [[Dataset]] from an RDD.
>  *
>  * @since 1.6.0
>  */
> implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] = 
> {
>  DatasetHolder(_sqlContext.createDataset(rdd))
> }
> /**
>  * Creates a [[Dataset]] from a local Seq.
>  * @since 1.6.0
>  */
> implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): 
> DatasetHolder[T] = {
>  DatasetHolder(_sqlContext.createDataset(s))
> }{code}
> By separating the static methods from these two methods that depend on 
> _sqlContext_ into  separate classes, we could provide static imports for all 
> the other functionality and only require the instance-bound  implicits for 
> the RDD and collection support (Which is an uncommon use case these days)
> As this is potentially breaking the current interface, this might be a 
> candidate for Spark 3.0. Although there's nothing stopping us from creating a 
> separate hierarchy for the static encoders already. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24202) Separate SQLContext dependency from SparkSession.implicits

Reply via email to