Yes I would just reuse the same function.
On Sun, Jul 8, 2018 at 5:01 AM Li Jin wrote:
> Hi Linar,
>
> This seems useful. But perhaps reusing the same function name is better?
>
>
> http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.SparkSession.createDataFrame
>
> Curren
Hi Linar,
This seems useful. But perhaps reusing the same function name is better?
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.SparkSession.createDataFrame
Currently createDataFrame takes an RDD of any kind of SQL data
representation(e.g. row, tuple, int, boolean,
We've created a snippet that creates a Spark DF from a RDD of many pandas
DFs in a distributed manner that does not require the driver to collect the
entire dataset.
Early tests show a performance improvement of x6-x10 over using
pandasDF->Rows>sparkDF.
I've seen that there are some open pull req