Re: createDataFrame allows column names as second param in Python not in Scala

2015-05-03 Thread Reynold Xin
We can't drop the existing createDataFrame one, since it breaks API compatibility, and the existing one also automatically infers the column name for case classes (in that case users most likely won't be declaring names directly). If this is really a problem, we should just create a new function

Re: createDataFrame allows column names as second param in Python not in Scala

2015-05-03 Thread Olivier Girardot
I have the perfect counter example where some of the data scientists prototype in Python and the production materials is done in Scala. But I get your point, as a matter of fact I realised the toDF method took parameters a little while after posting this. However the toDF still needs you to go

createDataFrame allows column names as second param in Python not in Scala

2015-05-02 Thread Olivier Girardot
Hi everyone, SQLContext.createDataFrame has different behaviour in Scala or Python : l = [('Alice', 1)] sqlContext.createDataFrame(l).collect() [Row(_1=u'Alice', _2=1)] sqlContext.createDataFrame(l, ['name', 'age']).collect() [Row(name=u'Alice', age=1)] and in Scala : scala val data =

Re: createDataFrame allows column names as second param in Python not in Scala

2015-05-02 Thread Reynold Xin
Part of the reason is that it is really easy to just call toDF on Scala, and we already have a lot of createDataFrame functions. (You might find some of the cross-language differences confusing, but I'd argue most real users just stick to one language, and developers or trainers are the only ones