[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-24 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19027 Sure - I think there are a number of different situations reported in the JIRA that could be separated into different fixes. Let me know what I can help with! --- If your project is

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19027 Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19027 Will merge this one BTW. Sounds we are fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19027 That's fine, @ueshin and @felixcheung. Adding few tests with `numpy` type might be an extra bit and (possibly) unrelated vs it's easy to add a test and might be a (possibly) common case users

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-24 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19027 @felixcheung I'm sorry if I'm missing something but it sounds like it's a different problem from this pr? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-24 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19027 It's not specific to it, but fairly common when people are calling numpy in UDF and returning its scalar type as-is. These scalar "looks" like Python native types (numpy.float_ vs float).

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19027 LGTM. Btw, I'm just curious why we need tests with `numpy` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19027 Will probably take a look through the problem in the near future including hard dependencies and etc. I took a quick look but I think I need more time but yes it looks appearently vaild point.

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19027 I'm ok without the test since this is unlikely to break in the future. We do have tests that depends on (optionally) numpy (and Arrow) - seems like we should be able to take on dependencies

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19027 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19027 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81056/ Test PASSed. ---

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19027 **[Test build #81056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81056/testReport)** for PR 19027 at commit

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19027 **[Test build #81056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81056/testReport)** for PR 19027 at commit

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19027 Oops, looks I need to check if numpy is available. Let me rather take this one out here as I am trying to whitelist `basestring` if you don't mind. I tested it with numpy in my local for your

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19027 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81053/ Test FAILed. ---

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19027 **[Test build #81053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81053/testReport)** for PR 19027 at commit

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19027 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19027 **[Test build #81053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81053/testReport)** for PR 19027 at commit

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19027 Thanks @felixcheung and @holdenk. I just added a simple test with numpy.float. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/19027 I like this approach @HyukjinKwon :D! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19027 Cool looks to me like a very reasonable fix. Could we perhaps add a test for numpy.bool_ or numpy.float_ (that it should fail)? --- If your project is set up for it, you can reply to this

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19027 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81030/ Test PASSed. ---

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19027 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19027 **[Test build #81030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81030/testReport)** for PR 19027 at commit

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19027 cc @zero323, @rdblue, @nchammas, @holdenk, @ueshin and @felixcheung. Could you take a look please? I think it is a small fix but the advantage is quite large. --- If your project is set up

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19027 **[Test build #81030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81030/testReport)** for PR 19027 at commit