[jira] [Created] (SPARK-25493) CRLF Line Separators don't work in multiline CSVs

2018-09-20 Thread Justin Uang (JIRA)
Justin Uang created SPARK-25493: --- Summary: CRLF Line Separators don't work in multiline CSVs Key: SPARK-25493 URL: https://issues.apache.org/jira/browse/SPARK-25493 Project: Spark Issue Type:

[jira] [Commented] (SPARK-9850) Adaptive execution in Spark

2016-04-12 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237056#comment-15237056 ] Justin Uang commented on SPARK-9850: I like this idea a lot. One thing we encounter in our use cases

[jira] [Commented] (SPARK-2183) Avoid loading/shuffling data twice in self-join query

2016-04-05 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226141#comment-15226141 ] Justin Uang commented on SPARK-2183: Yup, we're hitting this as well > Avoid loading/shuffling data

[jira] [Commented] (SPARK-9141) DataFrame recomputed instead of using cached parent.

2016-01-31 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125498#comment-15125498 ] Justin Uang commented on SPARK-9141: Does your explain() string grow exponentially w.r.t. to the

[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2016-01-28 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121275#comment-15121275 ] Justin Uang commented on SPARK-9301: Do we have a plan on how to implement these in native spark sql?

[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2016-01-28 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121476#comment-15121476 ] Justin Uang commented on SPARK-9301: Yea, my workaround has been json'ifying the struct into a string

[jira] [Comment Edited] (SPARK-10915) Add support for UDAFs in Python

2015-12-15 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059055#comment-15059055 ] Justin Uang edited comment on SPARK-10915 at 12/15/15 11:07 PM: An

[jira] [Commented] (SPARK-10915) Add support for UDAFs in Python

2015-12-15 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059055#comment-15059055 ] Justin Uang commented on SPARK-10915: - An abstract base class would be fine, or something like

[jira] [Created] (SPARK-12157) Support numpy types as return values of Python UDFs

2015-12-05 Thread Justin Uang (JIRA)
Justin Uang created SPARK-12157: --- Summary: Support numpy types as return values of Python UDFs Key: SPARK-12157 URL: https://issues.apache.org/jira/browse/SPARK-12157 Project: Spark Issue

[jira] [Commented] (SPARK-12157) Support numpy types as return values of Python UDFs

2015-12-05 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043594#comment-15043594 ] Justin Uang commented on SPARK-12157: - Good question, scala types would be good enough for this

[jira] [Created] (SPARK-10915) Add support for UDAFs in Python

2015-10-02 Thread Justin Uang (JIRA)
Justin Uang created SPARK-10915: --- Summary: Add support for UDAFs in Python Key: SPARK-10915 URL: https://issues.apache.org/jira/browse/SPARK-10915 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-9313) Enable a "docker run" invocation in place of PYSPARK_PYTHON

2015-09-14 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744731#comment-14744731 ] Justin Uang commented on SPARK-9313: This would be hugely helpful. I'm working on a platform that

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-08 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735406#comment-14735406 ] Justin Uang commented on SPARK-8632: Davies, what do you mean by upstream? I didn't quite understand

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-08 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735749#comment-14735749 ] Justin Uang commented on SPARK-8632: I set the batch mode to be 100, which is the same as before

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-08 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736022#comment-14736022 ] Justin Uang commented on SPARK-8632: Just pushed, any comments would be much appreciated. I didn't

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-07 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733867#comment-14733867 ] Justin Uang commented on SPARK-8632: Yea, I think that's the best solution for udfs, since the number

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-07 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734160#comment-14734160 ] Justin Uang commented on SPARK-8632: I have a solution working on my computer. I'm going to clean it

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-07 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734000#comment-14734000 ] Justin Uang commented on SPARK-8632: I have started working on this. I hope to get a draft out soon.

[jira] [Created] (SPARK-10447) Upgrade pyspark to use py4j 0.9

2015-09-04 Thread Justin Uang (JIRA)
Justin Uang created SPARK-10447: --- Summary: Upgrade pyspark to use py4j 0.9 Key: SPARK-10447 URL: https://issues.apache.org/jira/browse/SPARK-10447 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-10447) Upgrade pyspark to use py4j 0.9

2015-09-04 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731164#comment-14731164 ] Justin Uang commented on SPARK-10447: - Agreed, I'm pretty sure that this will break some APIs and

[jira] [Commented] (SPARK-10447) Upgrade pyspark to use py4j 0.9

2015-09-04 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731592#comment-14731592 ] Justin Uang commented on SPARK-10447: - Sure, I wouldn't mind doing the code review. Can you add me?

[jira] [Commented] (SPARK-10447) Upgrade pyspark to use py4j 0.9

2015-09-04 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731598#comment-14731598 ] Justin Uang commented on SPARK-10447: - Sound good > Upgrade pyspark to use py4j 0.9 >

[jira] [Commented] (SPARK-9141) DataFrame recomputed instead of using cached parent.

2015-07-31 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649437#comment-14649437 ] Justin Uang commented on SPARK-9141: (Taken from spark dev email:

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-07-02 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612608#comment-14612608 ] Justin Uang commented on SPARK-8632: Haven't gotten around to it yet. I'll let you

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-06-25 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601288#comment-14601288 ] Justin Uang commented on SPARK-8632: [~davies], my current plan is to switch to a

[jira] [Created] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-06-25 Thread Justin Uang (JIRA)
Justin Uang created SPARK-8632: -- Summary: Poor Python UDF performance because of RDD caching Key: SPARK-8632 URL: https://issues.apache.org/jira/browse/SPARK-8632 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-595) Document local-cluster mode

2015-06-18 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591817#comment-14591817 ] Justin Uang commented on SPARK-595: --- Sure, I'll get to it after I finish some tasks for

[jira] [Commented] (SPARK-595) Document local-cluster mode

2015-06-17 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590486#comment-14590486 ] Justin Uang commented on SPARK-595: --- +1 We are using for internal testing to ensure that

[jira] [Commented] (SPARK-7899) PySpark sql/tests breaks pylint validation

2015-05-30 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566089#comment-14566089 ] Justin Uang commented on SPARK-7899: Can we get this back ported into spark 1.4 or is

[jira] [Commented] (SPARK-7899) PySpark sql/tests breaks pylint validation

2015-05-27 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561709#comment-14561709 ] Justin Uang commented on SPARK-7899: Building upon michael's comment, the reason it

[jira] [Commented] (SPARK-7899) PySpark sql/tests breaks pylint validation

2015-05-27 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561732#comment-14561732 ] Justin Uang commented on SPARK-7899: Building upon michael's comment, the reason it

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2015-05-21 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555151#comment-14555151 ] Justin Uang commented on SPARK-7768: Agreed. For example, we wanted to add a

[jira] [Commented] (SPARK-6999) infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])

2015-04-23 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509209#comment-14509209 ] Justin Uang commented on SPARK-6999: We might be able to use

[jira] [Commented] (SPARK-6999) infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])

2015-04-20 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502994#comment-14502994 ] Justin Uang commented on SPARK-6999: Looking at the source, it looks like one way to

[jira] [Updated] (SPARK-6999) infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])

2015-04-19 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Uang updated SPARK-6999: --- Description: It looks like {code} def createDataFrame(rowRDD: JavaRDD[Row], columns:

[jira] [Created] (SPARK-6999) infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])

2015-04-19 Thread Justin Uang (JIRA)
Justin Uang created SPARK-6999: -- Summary: infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String]) Key: SPARK-6999 URL: https://issues.apache.org/jira/browse/SPARK-6999 Project: