Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Abdeali Kothari
Nice, will test it out +1 On Tue, Mar 26, 2019, 22:38 Reynold Xin wrote: > We just made the repo public: https://github.com/databricks/spark-pandas > > > On Tue, Mar 26, 2019 at 1:20 AM, Timothee Hunter > wrote: > >> To add more details to what Reynold mentioned. As you said, there is >> going

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Reynold Xin
We just made the repo public: https://github.com/databricks/spark-pandas On Tue, Mar 26, 2019 at 1:20 AM, Timothee Hunter < timhun...@databricks.com > wrote: > > To add more details to what Reynold mentioned. As you said, there is going > to be some slight differences in any case between

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Timothee Hunter
To add more details to what Reynold mentioned. As you said, there is going to be some slight differences in any case between Pandas and Spark in any case, simply because Spark needs to know the return types of the functions. In your case, you would need to slightly refactor your apply method to

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Hyukjin Kwon
BTW, I am working on the documentation related with this subject at https://issues.apache.org/jira/browse/SPARK-26022 to describe the difference 2019년 3월 26일 (화) 오후 3:34, Reynold Xin 님이 작성: > We have some early stuff there but not quite ready to talk about it in > public yet (I hope soon

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Reynold Xin
We have some early stuff there but not quite ready to talk about it in public yet (I hope soon though). Will shoot you a separate email on it. On Mon, Mar 25, 2019 at 11:32 PM Abdeali Kothari wrote: > Thanks for the reply Reynold - Has this shim project started ? > I'd love to contribute to it

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Abdeali Kothari
Thanks for the reply Reynold - Has this shim project started ? I'd love to contribute to it - as it looks like I have started making a bunch of helper functions to do something similar for my current task and would prefer not doing it in isolation. Was considering making a git repo and pushing

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Reynold Xin
We have been thinking about some of these issues. Some of them are harder to do, e.g. Spark DataFrames are fundamentally immutable, and making the logical plan mutable is a significant deviation from the current paradigm that might confuse the hell out of some users. We are considering building a

PySpark syntax vs Pandas syntax

2019-03-26 Thread Abdeali Kothari
Hi, I was doing some spark to pandas (and vice versa) conversion because some of the pandas codes we have don't work on huge data. And some spark codes work very slow on small data. It was nice to see that pyspark had some similar syntax for the common pandas operations that the python community