PySpark still uses Spark dataframe underneath (it wraps java code). Use PySpark when you have to deal with big data ETL and analytics so you can leverage the distributed architecture in Spark.  If you job is simple, dataset is relatively small, and doesn't require distributed processing, use Pandas..

-- ND

On 7/29/21 9:02 AM, ashok34...@yahoo.com.INVALID wrote:
Hello team

Someone asked me regarding well developed Python code with Panda dataframe and comparing that to PySpark.

Under what situations one choose PySpark instead of Python and Pandas.

Appreciate


AK


Reply via email to