Re: Well balanced Python code with Pandas compared to PySpark

2021-07-29 Thread Mich Talebzadeh
Yes indeed very good points by the Artemis User. Just to add if I may, why choose Spark? Generally, parallel architecture comes into play when the data size is significantly large which cannot be handled on a single machine, hence, the use of Spark becomes meaningful. In cases where (the

Re: Well balanced Python code with Pandas compared to PySpark

2021-07-29 Thread Artemis User
PySpark still uses Spark dataframe underneath (it wraps java code). Use PySpark when you have to deal with big data ETL and analytics so you can leverage the distributed architecture in Spark.  If you job is simple, dataset is relatively small, and doesn't require distributed processing, use

Well balanced Python code with Pandas compared to PySpark

2021-07-29 Thread ashok34...@yahoo.com.INVALID
Hello team Someone asked me regarding well developed Python code with Panda dataframe and comparing that to PySpark. Under what situations one choose PySpark instead of Python and Pandas. Appreciate AK