[jira] [Comment Edited] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-08-19 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585286#comment-16585286 ] Leif Walsh edited comment on SPARK-21187 at 8/19/18 10:44 PM: -- [~bryanc] is

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-08-19 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585286#comment-16585286 ] Leif Walsh commented on SPARK-21187: [~bryanc] is there anything I can help elaborate on, or do you

[jira] [Commented] (SPARK-24258) SPIP: Improve PySpark support for ML Matrix and Vector types

2018-06-12 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510406#comment-16510406 ] Leif Walsh commented on SPARK-24258: I think for PySpark users, we could just make it easy to use

[jira] [Updated] (SPARK-24258) SPIP: Improve PySpark support for ML Matrix and Vector types

2018-05-12 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Walsh updated SPARK-24258: --- Description: h1. Background and Motivation: In Spark ML ({{pyspark.ml.linalg}}), there are four

[jira] [Created] (SPARK-24258) SPIP: Improve PySpark support for ML Matrix and Vector types

2018-05-12 Thread Leif Walsh (JIRA)
Leif Walsh created SPARK-24258: -- Summary: SPIP: Improve PySpark support for ML Matrix and Vector types Key: SPARK-24258 URL: https://issues.apache.org/jira/browse/SPARK-24258 Project: Spark

[jira] [Commented] (SPARK-22340) pyspark setJobGroup doesn't match java threads

2017-10-25 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219825#comment-16219825 ] Leif Walsh commented on SPARK-22340: By monkey-patching {{SCCallSiteSync}}, I'm able to inject a call

[jira] [Commented] (SPARK-22340) pyspark setJobGroup doesn't match java threads

2017-10-24 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217835#comment-16217835 ] Leif Walsh commented on SPARK-22340: Ok, this is fairly straightforward. The problem is that from

[jira] [Commented] (SPARK-22340) pyspark setJobGroup doesn't match java threads

2017-10-24 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217535#comment-16217535 ] Leif Walsh commented on SPARK-22340: This is less spooky than I initially thought, I will explain

[jira] [Updated] (SPARK-22340) pyspark setJobGroup doesn't match java threads

2017-10-24 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Walsh updated SPARK-22340: --- Description: With pyspark, {{sc.setJobGroup}}'s documentation says {quote} Assigns a group ID to

[jira] [Created] (SPARK-22340) pyspark setJobGroup doesn't match java threads

2017-10-23 Thread Leif Walsh (JIRA)
Leif Walsh created SPARK-22340: -- Summary: pyspark setJobGroup doesn't match java threads Key: SPARK-22340 URL: https://issues.apache.org/jira/browse/SPARK-22340 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-05 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153986#comment-16153986 ] Leif Walsh commented on SPARK-21190: I think the size parameter is confusing: if a

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-01 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151377#comment-16151377 ] Leif Walsh commented on SPARK-21190: You can also make a Series with no content and an index, but

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-01 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151376#comment-16151376 ] Leif Walsh commented on SPARK-21190: Yep, that's totally a thing: {noformat}In [1]: import pandas as

[jira] [Comment Edited] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-01 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151261#comment-16151261 ] Leif Walsh edited comment on SPARK-21190 at 9/1/17 10:55 PM: - I'm not 100%

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-01 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151261#comment-16151261 ] Leif Walsh commented on SPARK-21190: I'm not 100% sure this is legal pandas but I think it might be.

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-07-24 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098524#comment-16098524 ] Leif Walsh commented on SPARK-21187: Also, if you're unfamiliar, {{object}} columns are rather slow

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-07-24 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098522#comment-16098522 ] Leif Walsh commented on SPARK-21187: [~rxin] [~bryanc], pandas does support array and map columns, it

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2017-07-11 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083345#comment-16083345 ] Leif Walsh commented on SPARK-13534: See SPARK-21190 for a case we're considering for using arrow to

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-07-05 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074857#comment-16074857 ] Leif Walsh commented on SPARK-21190: If the user specifies an int return type but produces floats in

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-07-03 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072429#comment-16072429 ] Leif Walsh commented on SPARK-21190: I believe we could also compute window indexes while we stream

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-07-03 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072424#comment-16072424 ] Leif Walsh commented on SPARK-21190: I figure we could address that by using shared memory, if we

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-06-30 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070809#comment-16070809 ] Leif Walsh commented on SPARK-21190: I think we can get away with doing windowing (deciding which

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-06-29 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069269#comment-16069269 ] Leif Walsh commented on SPARK-21190: I agree with [~icexelloss] that we should aim to provide an API