[jira] [Commented] (SPARK-22947) SPIP: as-of join in Spark SQL

2022-02-22 Thread Li Jin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496142#comment-17496142 ] Li Jin commented on SPARK-22947: For those who are interested, I will no longer on work

[jira] [Commented] (SPARK-33057) Cannot use filter with window operations

2020-10-16 Thread Li Jin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215468#comment-17215468 ] Li Jin commented on SPARK-33057: I agree this is an improvement rather than a bug. Alth

[jira] [Updated] (SPARK-33057) Cannot use filter with window operations

2020-10-02 Thread Li Jin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-33057: --- Description: Current, trying to use filter with a window operations will fail:   {code:java} df = spark.ran

[jira] [Created] (SPARK-33057) Cannot use filter with window operations

2020-10-02 Thread Li Jin (Jira)
Li Jin created SPARK-33057: -- Summary: Cannot use filter with window operations Key: SPARK-33057 URL: https://issues.apache.org/jira/browse/SPARK-33057 Project: Spark Issue Type: Bug Compon

[jira] [Comment Edited] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-07-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896132#comment-16896132 ] Li Jin edited comment on SPARK-28482 at 7/30/19 1:28 PM: - Hi [~j

[jira] [Comment Edited] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-07-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896132#comment-16896132 ] Li Jin edited comment on SPARK-28482 at 7/30/19 1:28 PM: - Hi [~j

[jira] [Commented] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-07-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896132#comment-16896132 ] Li Jin commented on SPARK-28482: Hi [~jiangyu1211], thank you for the bug report.   >Fro

[jira] [Commented] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-07-26 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894134#comment-16894134 ] Li Jin commented on SPARK-28502: Hmm.. I think this has sth to do with timezone, can you

[jira] [Updated] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause

2019-07-17 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28422: --- Description:   {code:java} @pandas_udf('double', PandasUDFType.GROUPED_AGG) def max_udf(v): return v.max

[jira] [Updated] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause

2019-07-17 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28422: --- Summary: GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause (was: GROUPED_AGG pandas_u

[jira] [Updated] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause

2019-07-17 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28422: --- Description:   {code:java} @pandas_udf('double', PandasUDFType.GROUPED_AGG) def max_udf(v): return v.max

[jira] [Created] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql without group by clause

2019-07-17 Thread Li Jin (JIRA)
Li Jin created SPARK-28422: -- Summary: GROUPED_AGG pandas_udf doesn't with spark.sql without group by clause Key: SPARK-28422 URL: https://issues.apache.org/jira/browse/SPARK-28422 Project: Spark Is

[jira] [Comment Edited] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863186#comment-16863186 ] Li Jin edited comment on SPARK-28006 at 6/13/19 3:36 PM: - Hi [~v

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863186#comment-16863186 ] Li Jin commented on SPARK-28006: Hi [~viirya] good questions: >> Can we use pandas agg

[jira] [Commented] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

2019-06-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863177#comment-16863177 ] Li Jin commented on SPARK-27463: Yeah I think the exact spelling of the API can go eithe

[jira] [Updated] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28006: --- Description: Currently, pandas_udf supports "grouped aggregate" type that can be used with unbounded and un

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862505#comment-16862505 ] Li Jin commented on SPARK-28006: Thanks [~hyukjin.kwon] for the comments! I updated the

[jira] [Updated] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28006: --- Description: Currently, pandas_udf supports "grouped aggregate" type that can be used with unbounded and un

[jira] [Updated] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28006: --- Description: Currently, pandas_udf supports "grouped aggregate" type that can be used with unbounded and un

[jira] [Commented] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862287#comment-16862287 ] Li Jin commented on SPARK-27463: I think one way to design this API to mimic the existin

[jira] [Commented] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862225#comment-16862225 ] Li Jin commented on SPARK-27463: For cogroup, I don't think there is analogous API in pa

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-11 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861559#comment-16861559 ] Li Jin commented on SPARK-28006: cc [~hyukjin.kwon] [~LI,Xiao] [~ueshin] [~bryanc] I th

[jira] [Updated] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-11 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28006: --- Description: Currently, pandas_udf supports "grouped aggregate" type that can be used with unbounded and un

[jira] [Created] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-11 Thread Li Jin (JIRA)
Li Jin created SPARK-28006: -- Summary: User-defined grouped transform pandas_udf for window operations Key: SPARK-28006 URL: https://issues.apache.org/jira/browse/SPARK-28006 Project: Spark Issue Ty

[jira] [Updated] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-11 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28003: --- Affects Version/s: (was: 2.3.2) 2.3.3 > spark.createDataFrame with Arrow doesn't

[jira] [Created] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-11 Thread Li Jin (JIRA)
Li Jin created SPARK-28003: -- Summary: spark.createDataFrame with Arrow doesn't work with pandas.NaT Key: SPARK-28003 URL: https://issues.apache.org/jira/browse/SPARK-28003 Project: Spark Issue Typ

[jira] [Updated] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-11 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28003: --- Affects Version/s: (was: 2.4.0) 2.3.2 2.4.3 > spark.create

[jira] [Commented] (SPARK-27538) sparksql could not start in jdk11, exception org.datanucleus.exceptions.NucleusException: The java type java.lang.Long (jdbc-type='', sql-type="") cant be mapped for t

2019-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851185#comment-16851185 ] Li Jin commented on SPARK-27538: [~hyukjin.kwon] I saw you closed this. I wonder if this

[jira] [Commented] (SPARK-26410) Support per Pandas UDF configuration

2018-12-21 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726824#comment-16726824 ] Li Jin commented on SPARK-26410: One thing we want think about is whether or not to mix

[jira] [Commented] (SPARK-26410) Support per Pandas UDF configuration

2018-12-21 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726820#comment-16726820 ] Li Jin commented on SPARK-26410: Thanks for the explanation. I think it makes sense to h

[jira] [Commented] (SPARK-26412) Allow Pandas UDF to take an iterator of pd.DataFrames for the entire partition

2018-12-20 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725957#comment-16725957 ] Li Jin commented on SPARK-26412: So this is similar to the mapPartitions API in Scala bu

[jira] [Commented] (SPARK-26410) Support per Pandas UDF configuration

2018-12-20 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725948#comment-16725948 ] Li Jin commented on SPARK-26410: I am curious why would user want to configure maxRecord

[jira] [Created] (SPARK-26364) Clean up import statements in pandas udf tests

2018-12-13 Thread Li Jin (JIRA)
Li Jin created SPARK-26364: -- Summary: Clean up import statements in pandas udf tests Key: SPARK-26364 URL: https://issues.apache.org/jira/browse/SPARK-26364 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-26328) Use GenerateOrdering for group key comparison in WindowExec

2018-12-10 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin resolved SPARK-26328. Resolution: Not A Problem > Use GenerateOrdering for group key comparison in WindowExec >

[jira] [Created] (SPARK-26328) Use GenerateOrdering for group key comparison in WindowExec

2018-12-10 Thread Li Jin (JIRA)
Li Jin created SPARK-26328: -- Summary: Use GenerateOrdering for group key comparison in WindowExec Key: SPARK-26328 URL: https://issues.apache.org/jira/browse/SPARK-26328 Project: Spark Issue Type:

[jira] [Updated] (SPARK-25640) Clarify/Improve EvalType for grouped aggregate and window aggregate

2018-10-04 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-25640: --- Description: Currently, grouped aggregate and window aggregate uses different EvalType, however, they map t

[jira] [Created] (SPARK-25640) Clarify/Improve EvalType for grouped aggregate and window aggregate

2018-10-04 Thread Li Jin (JIRA)
Li Jin created SPARK-25640: -- Summary: Clarify/Improve EvalType for grouped aggregate and window aggregate Key: SPARK-25640 URL: https://issues.apache.org/jira/browse/SPARK-25640 Project: Spark Issu

[jira] [Commented] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-28 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594973#comment-16594973 ] Li Jin commented on SPARK-25213: This is resolved by https://github.com/apache/spark/pul

[jira] [Updated] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-25216: --- Description: The current error message is  often confusing to a new Spark user that a column containing "."

[jira] [Updated] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-25216: --- Description: The current error message is  often confusing to a new Spark user that a column containing "."

[jira] [Created] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
Li Jin created SPARK-25216: -- Summary: Provide better error message when a column contains dot and needs backticks quote Key: SPARK-25216 URL: https://issues.apache.org/jira/browse/SPARK-25216 Project: Spark

[jira] [Created] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Li Jin (JIRA)
Li Jin created SPARK-25213: -- Summary: DataSourceV2 doesn't seem to produce unsafe rows Key: SPARK-25213 URL: https://issues.apache.org/jira/browse/SPARK-25213 Project: Spark Issue Type: Task

[jira] [Commented] (SPARK-24561) User-defined window functions with pandas udf (bounded window)

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579965#comment-16579965 ] Li Jin commented on SPARK-24561: I am looking into this. Early investigation:  https://d

[jira] [Updated] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24721: --- Component/s: SQL > Failed to use PythonUDF with literal inputs in filter with data sources > ---

[jira] [Updated] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24721: --- Issue Type: Bug (was: Sub-task) Parent: (was: SPARK-22216) > Failed to use PythonUDF with liter

[jira] [Comment Edited] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579956#comment-16579956 ] Li Jin edited comment on SPARK-24721 at 8/14/18 3:26 PM: - Update

[jira] [Commented] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579956#comment-16579956 ] Li Jin commented on SPARK-24721: Updates Jira title to reflect the actual issue > Faile

[jira] [Updated] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24721: --- Summary: Failed to use PythonUDF with literal inputs in filter with data sources (was: Failed to call Pytho

[jira] [Comment Edited] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-27 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560337#comment-16560337 ] Li Jin edited comment on SPARK-24721 at 7/27/18 9:18 PM: - I thin

[jira] [Commented] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-27 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560337#comment-16560337 ] Li Jin commented on SPARK-24721: I think the issue is the UDF is being pushed down to th

[jira] [Commented] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-27 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560283#comment-16560283 ] Li Jin commented on SPARK-24721: {code:java} from pyspark.sql.functions import udf, lit,

[jira] [Updated] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24624: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-22216 > Can not mix vectorized and non-vector

[jira] [Updated] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24721: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-22216 > Failed to call PythonUDF whose input

[jira] [Commented] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544656#comment-16544656 ] Li Jin commented on SPARK-24721: I am currently traveling but will try to take a look wh

[jira] [Updated] (SPARK-24796) Support GROUPED_AGG_PANDAS_UDF in Pivot

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24796: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-22216 > Support GROUPED_AGG_PANDAS_UDF in Piv

[jira] [Commented] (SPARK-24796) Support GROUPED_AGG_PANDAS_UDF in Pivot

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544655#comment-16544655 ] Li Jin commented on SPARK-24796: Sorry I am traveling now but I will try to take a look

[jira] [Commented] (SPARK-24760) Pandas UDF does not handle NaN correctly

2018-07-09 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537662#comment-16537662 ] Li Jin commented on SPARK-24760: I think the issue here is that the output schema for th

[jira] [Commented] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530462#comment-16530462 ] Li Jin commented on SPARK-24721: Yep I can take a look > Failed to call PythonUDF whose

[jira] [Commented] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs

2018-06-21 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519590#comment-16519590 ] Li Jin commented on SPARK-24624: I can take a look at this one > Can not mix vectorized

[jira] [Comment Edited] (SPARK-24578) Reading remote cache block behavior changes and causes timeout issue

2018-06-18 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515879#comment-16515879 ] Li Jin edited comment on SPARK-24578 at 6/18/18 3:24 PM: - cc @ga

[jira] [Commented] (SPARK-24578) Reading remote cache block behavior changes and causes timeout issue

2018-06-18 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515879#comment-16515879 ] Li Jin commented on SPARK-24578: cc @gatorsmile We found this when switching from 2.2.1

[jira] [Updated] (SPARK-24578) Reading remote cache block behavior changes and causes timeout issue

2018-06-18 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24578: --- Component/s: (was: Input/Output) Spark Core > Reading remote cache block behavior chang

[jira] [Commented] (SPARK-24563) Allow running PySpark shell without Hive

2018-06-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512844#comment-16512844 ] Li Jin commented on SPARK-24563: Will submit a PR soon > Allow running PySpark shell wi

[jira] [Updated] (SPARK-24563) Allow running PySpark shell without Hive

2018-06-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24563: --- Description: A previous commit:  [https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f44

[jira] [Created] (SPARK-24563) Allow running PySpark shell without Hive

2018-06-14 Thread Li Jin (JIRA)
Li Jin created SPARK-24563: -- Summary: Allow running PySpark shell without Hive Key: SPARK-24563 URL: https://issues.apache.org/jira/browse/SPARK-24563 Project: Spark Issue Type: Bug Compon

[jira] [Updated] (SPARK-22239) User-defined window functions with pandas udf (unbounded window)

2018-06-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-22239: --- Description: Window function is another place we can benefit from vectored udf and add another useful funct

[jira] [Created] (SPARK-24561) User-defined window functions with pandas udf (bounded window)

2018-06-14 Thread Li Jin (JIRA)
Li Jin created SPARK-24561: -- Summary: User-defined window functions with pandas udf (bounded window) Key: SPARK-24561 URL: https://issues.apache.org/jira/browse/SPARK-24561 Project: Spark Issue Typ

[jira] [Updated] (SPARK-22239) User-defined window functions with pandas udf (unbounded window)

2018-06-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-22239: --- Summary: User-defined window functions with pandas udf (unbounded window) (was: User-defined window functio

[jira] [Commented] (SPARK-22239) User-defined window functions with pandas udf

2018-06-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511495#comment-16511495 ] Li Jin commented on SPARK-22239: [~hyukjin.kwon] I actually don't think this Jira is don

[jira] [Created] (SPARK-24521) Fix ineffective test in CachedTableSuite

2018-06-11 Thread Li Jin (JIRA)
Li Jin created SPARK-24521: -- Summary: Fix ineffective test in CachedTableSuite Key: SPARK-24521 URL: https://issues.apache.org/jira/browse/SPARK-24521 Project: Spark Issue Type: Test Compo

[jira] [Commented] (SPARK-24258) SPIP: Improve PySpark support for ML Matrix and Vector types

2018-06-08 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506399#comment-16506399 ] Li Jin commented on SPARK-24258: I ran into [~mengxr] and chatted about this. Seems a go

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans

2018-05-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495695#comment-16495695 ] Li Jin commented on SPARK-24373: [~smilegator] Thank you for the suggestion. > "df.cach

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493815#comment-16493815 ] Li Jin edited comment on SPARK-22947 at 5/29/18 4:34 PM: - Hi [~T

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493815#comment-16493815 ] Li Jin edited comment on SPARK-22947 at 5/29/18 4:34 PM: - Hi [~T

[jira] [Commented] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493815#comment-16493815 ] Li Jin commented on SPARK-22947: Hi [~TomaszGaweda] thanks for your interest! Yes I am w

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449935#comment-16449935 ] Li Jin edited comment on SPARK-22947 at 5/29/18 4:33 PM: - I came

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491309#comment-16491309 ] Li Jin commented on SPARK-24373: [~smilegator] do you mean that add AnalysisBarrier to Re

[jira] [Commented] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491125#comment-16491125 ] Li Jin commented on SPARK-24324: Moved under Spark-22216 for better ticket organization.

[jira] [Updated] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24324: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-22216 > Pandas Grouped Map UserDefinedFunction mixes c

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491086#comment-16491086 ] Li Jin commented on SPARK-24373: We use groupby() and pivot() > "df.cache() df.count()"

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 9:00 PM: - This is

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 9:00 PM: - This is

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 8:51 PM: - This is

[jira] [Commented] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489759#comment-16489759 ] Li Jin commented on SPARK-24324: This is a dup of https://issues.apache.org/jira/browse/S

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489060#comment-16489060 ] Li Jin commented on SPARK-24373: This is a reproduce in unit test: {code:java} test("cach

[jira] [Updated] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24373: --- Summary: "df.cache() df.count()" no longer eagerly caches data (was: Spark Dataset groupby.agg/count doesn't

[jira] [Comment Edited] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488109#comment-16488109 ] Li Jin edited comment on SPARK-24373 at 5/23/18 11:18 PM: -- We fo

[jira] [Updated] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24373: --- Component/s: (was: Input/Output) SQL > Spark Dataset groupby.agg/count doesn't respect c

[jira] [Commented] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488109#comment-16488109 ] Li Jin commented on SPARK-24373: I think this might be a regression from 2.2 Any one use

[jira] [Commented] (SPARK-24334) Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

2018-05-22 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484086#comment-16484086 ] Li Jin commented on SPARK-24334: [~pi3ni0] did it happen for you when your UDF throws exc

[jira] [Commented] (SPARK-24334) Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

2018-05-21 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483004#comment-16483004 ] Li Jin commented on SPARK-24334: I have done some investigation and will submit a PR soon

[jira] [Created] (SPARK-24334) Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

2018-05-21 Thread Li Jin (JIRA)
Li Jin created SPARK-24334: -- Summary: Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator Key: SPARK-24334 URL: https://issues.apache.org/jira/browse/SPARK-24334 Project: Sp

[jira] [Commented] (SPARK-22239) User-defined window functions with pandas udf

2018-04-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452977#comment-16452977 ] Li Jin commented on SPARK-22239: [~hvanhovell], I have done a bit further research of UDF

[jira] [Commented] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-04-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452355#comment-16452355 ] Li Jin commented on SPARK-23929: [~tr3w] does using OrderedDict help in your case? > pan

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-04-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449935#comment-16449935 ] Li Jin edited comment on SPARK-22947 at 4/24/18 2:16 PM: - I came

[jira] [Commented] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-04-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449935#comment-16449935 ] Li Jin commented on SPARK-22947: I came across this blog today: [https://databricks.com/

[jira] [Updated] (SPARK-24019) AnalysisException for Window function expression to compute derivative

2018-04-20 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24019: --- Component/s: (was: Spark Core) SQL > AnalysisException for Window function expression to

[jira] [Comment Edited] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-04-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438935#comment-16438935 ] Li Jin edited comment on SPARK-23929 at 4/16/18 2:40 AM: - I agree

[jira] [Commented] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-04-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438935#comment-16438935 ] Li Jin commented on SPARK-23929: I agree with [~hyukjin.kwon]. Seems like there is not a

[jira] [Commented] (SPARK-23030) Decrease memory consumption with toPandas() collection using Arrow

2018-04-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438001#comment-16438001 ] Li Jin commented on SPARK-23030: Hey [~bryanc], did you by an chance have some process on

  1   2   3   >