Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

2020-03-20 Thread Ben Roling
Ah, yes, I clearly didn't read closely enough! It was right there in front of me... Thanks! On Wed, Mar 18, 2020, 12:36 PM Maciej Szymkiewicz wrote: > Hi Ben, > > Please note that `_sc` is not a SQLContext. It is a SparkContext, which > is used primarily for internal calls. > > SQLContext is

Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

2020-03-18 Thread Maciej Szymkiewicz
Hi Ben, Please note that `_sc` is not a SQLContext. It is a SparkContext, which is used primarily for internal calls. SQLContext is exposed through `sql_ctx` (https://github.com/apache/spark/blob/8bfaa62f2fcc942dd99a63b20366167277bce2a1/python/pyspark/sql/dataframe.py#L80) On 3/17/20 5:53 PM,

Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

2020-03-17 Thread Ben Roling
I tried this on the users mailing list but didn't get traction. It's probably more appropriate here anyway. I've noticed that DataSet.sqlContext is public in Scala but the equivalent (DataFrame._sc) in PySpark is named as if it should be treated as private. Is this intentional? If so, what's