Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Punyashloka Biswal
Is there a foolproof way to access methods exclusively (instead of picking between columns and methods at runtime)? Here are two ideas, neither of which seems particularly Pythonic - pyspark.sql.methods(df).name() - df.__methods__.name() Punya On Fri, May 8, 2015 at 10:06 AM Nicholas Chamm

Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Nicholas Chammas
And a link to SPARK-7035 (which Xiangrui mentioned in his initial email) for the lazy. On Fri, May 8, 2015 at 3:41 AM Xiangrui Meng wrote: > On Fri, May 8, 2015 at 12:18 AM, Shivaram Venkataraman > wrote: > > I dont know much about Python style

Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Xiangrui Meng
On Fri, May 8, 2015 at 12:18 AM, Shivaram Venkataraman wrote: > I dont know much about Python style, but I think the point Wes made about > usability on the JIRA is pretty powerful. IMHO the number of methods on a > Spark DataFrame might not be much more compared to Pandas. Given that it > looks l

Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Shivaram Venkataraman
I dont know much about Python style, but I think the point Wes made about usability on the JIRA is pretty powerful. IMHO the number of methods on a Spark DataFrame might not be much more compared to Pandas. Given that it looks like users are okay with the possibility of collisions in Pandas I think

Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Xiangrui Meng
Hi all, In PySpark, a DataFrame column can be referenced using df["abcd"] (__getitem__) and df.abcd (__getattr__). There is a discussion on SPARK-7035 on compatibility issues with the __getattr__ approach, and I want to collect more inputs on this. Basically, if in the future we introduce a new m