[jira] [Commented] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937495#comment-16937495 ] Liang-Chi Hsieh commented on SPARK-29239: - I added SPARK-29221 to the title of t

[jira] [Commented] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937486#comment-16937486 ] Liang-Chi Hsieh commented on SPARK-29239: - Yes. > Subquery should not cause NPE

[jira] [Created] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29239: --- Summary: Subquery should not cause NPE when eliminating subexpression Key: SPARK-29239 URL: https://issues.apache.org/jira/browse/SPARK-29239 Project: Spark

[jira] [Resolved] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-29181. - Resolution: Duplicate > Cache preferred locations of checkpointed RDD >

[jira] [Assigned] (SPARK-29182) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29182: --- Assignee: Liang-Chi Hsieh > Cache preferred locations of checkpointed RDD > ---

[jira] [Commented] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933955#comment-16933955 ] Liang-Chi Hsieh commented on SPARK-29181: - [~dongjoon] Thanks. Not aware of crea

[jira] [Created] (SPARK-29182) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29182: --- Summary: Cache preferred locations of checkpointed RDD Key: SPARK-29182 URL: https://issues.apache.org/jira/browse/SPARK-29182 Project: Spark Issue Typ

[jira] [Created] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29181: --- Summary: Cache preferred locations of checkpointed RDD Key: SPARK-29181 URL: https://issues.apache.org/jira/browse/SPARK-29181 Project: Spark Issue Typ

[jira] [Commented] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932784#comment-16932784 ] Liang-Chi Hsieh commented on SPARK-29042: - [~hyukjin.kwon] Am I setting the fix

[jira] [Updated] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-29042: Fix Version/s: 2.4.5 > Sampling-based RDD with unordered input should be INDETERMINATE > -

[jira] [Resolved] (SPARK-22796) Add multiple column support to PySpark QuantileDiscretizer

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-22796. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25812 [http

[jira] [Assigned] (SPARK-22796) Add multiple column support to PySpark QuantileDiscretizer

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-22796: --- Assignee: Huaxin Gao > Add multiple column support to PySpark QuantileDiscretizer >

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930827#comment-16930827 ] Liang-Chi Hsieh commented on SPARK-28927: - Regarding to AUC unstable issue, the

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930787#comment-16930787 ] Liang-Chi Hsieh commented on SPARK-26205: - [~cloud_fan]. I see now. Created SPAR

[jira] [Assigned] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29100: --- Assignee: Liang-Chi Hsieh > Codegen with switch in InSet expression causes compilat

[jira] [Updated] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-29100: Description: SPARK-26205 adds an optimization to InSet that generates Java switch conditio

[jira] [Created] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29100: --- Summary: Codegen with switch in InSet expression causes compilation error Key: SPARK-29100 URL: https://issues.apache.org/jira/browse/SPARK-29100 Project: Spark

[jira] [Comment Edited] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930658#comment-16930658 ] Liang-Chi Hsieh edited comment on SPARK-28927 at 9/16/19 3:36 PM:

[jira] [Comment Edited] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930658#comment-16930658 ] Liang-Chi Hsieh edited comment on SPARK-28927 at 9/16/19 3:35 PM:

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930658#comment-16930658 ] Liang-Chi Hsieh commented on SPARK-28927: - Because you are using 2.2.1, spark.sq

[jira] [Assigned] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-14 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-28927: --- Assignee: Liang-Chi Hsieh > ArrayIndexOutOfBoundsException and Not-stable AUC metri

[jira] [Assigned] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29042: --- Assignee: Liang-Chi Hsieh > Sampling-based RDD with unordered input should be INDET

[jira] [Resolved] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-29042. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25751 [http

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929334#comment-16929334 ] Liang-Chi Hsieh commented on SPARK-26205: - [~cloud_fan] I ran a simple test, see

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-12 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928649#comment-16928649 ] Liang-Chi Hsieh commented on SPARK-26205: - Yeah, I will look at it. > Optimize

[jira] [Created] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-10 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29042: --- Summary: Sampling-based RDD with unordered input should be INDETERMINATE Key: SPARK-29042 URL: https://issues.apache.org/jira/browse/SPARK-29042 Project: Spark

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-10 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926814#comment-16926814 ] Liang-Chi Hsieh commented on SPARK-28927: - Hi [~JerryHouse], do you use any non-

[jira] [Assigned] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2019-09-09 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-23265: --- Assignee: Huaxin Gao > Update multi-column error handling logic in QuantileDiscreti

[jira] [Resolved] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2019-09-09 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-23265. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 20442 [http

[jira] [Created] (SPARK-29013) Structurally equivalent subexpression elimination

2019-09-06 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29013: --- Summary: Structurally equivalent subexpression elimination Key: SPARK-29013 URL: https://issues.apache.org/jira/browse/SPARK-29013 Project: Spark Issue

[jira] [Updated] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28933: Fix Version/s: 3.0.0 > Reduce unnecessary shuffle in ALS when initializing factors > -

[jira] [Commented] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920584#comment-16920584 ] Liang-Chi Hsieh commented on SPARK-28933: - This issue was resolved by [https://g

[jira] [Resolved] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-28933. - Resolution: Resolved > Reduce unnecessary shuffle in ALS when initializing factors > ---

[jira] [Commented] (SPARK-28935) Document SQL metrics for Details for Query Plan

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920550#comment-16920550 ] Liang-Chi Hsieh commented on SPARK-28935: - Thanks! [~smilegator] It should be h

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920480#comment-16920480 ] Liang-Chi Hsieh commented on SPARK-28927: - Does this only happen on 2.2.1? How a

[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23519: Component/s: (was: Spark Core) > Create View Commands Fails with The view output (col

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920302#comment-16920302 ] Liang-Chi Hsieh commented on SPARK-23519: - This was closed and then reopened and

[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23519: Labels: (was: bulk-closed) > Create View Commands Fails with The view output (col1,col1

[jira] [Commented] (SPARK-28935) Document SQL metrics for Details for Query Plan

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919975#comment-16919975 ] Liang-Chi Hsieh commented on SPARK-28935: - Thanks for pinging me! I will look in

[jira] [Resolved] (SPARK-28926) CLONE - ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-28926. - Resolution: Duplicate I think this is duplicate to SPARK-28927. > CLONE - ArrayIndexOut

[jira] [Assigned] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-28933: --- Assignee: Liang-Chi Hsieh > Reduce unnecessary shuffle in ALS when initializing fac

[jira] [Created] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-08-30 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28933: --- Summary: Reduce unnecessary shuffle in ALS when initializing factors Key: SPARK-28933 URL: https://issues.apache.org/jira/browse/SPARK-28933 Project: Spark

[jira] [Created] (SPARK-28920) Set up java version for github workflow

2019-08-29 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28920: --- Summary: Set up java version for github workflow Key: SPARK-28920 URL: https://issues.apache.org/jira/browse/SPARK-28920 Project: Spark Issue Type: Imp

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-26 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915809#comment-16915809 ] Liang-Chi Hsieh commented on SPARK-23519: - I test with Hive 2.1. It doesn't supp

[jira] [Resolved] (SPARK-25549) High level API to collect RDD statistics

2019-08-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-25549. - Resolution: Won't Fix > High level API to collect RDD statistics > -

[jira] [Commented] (SPARK-25549) High level API to collect RDD statistics

2019-08-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915362#comment-16915362 ] Liang-Chi Hsieh commented on SPARK-25549: - Close this as it is not needed now.

[jira] [Created] (SPARK-28866) Persist item factors RDD when checkpointing in ALS

2019-08-25 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28866: --- Summary: Persist item factors RDD when checkpointing in ALS Key: SPARK-28866 URL: https://issues.apache.org/jira/browse/SPARK-28866 Project: Spark Issu

[jira] [Commented] (SPARK-24666) Word2Vec generate infinity vectors when numIterations are large

2019-08-24 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914950#comment-16914950 ] Liang-Chi Hsieh commented on SPARK-24666: - I tried to run word2vec with Quora Qu

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-22 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913915#comment-16913915 ] Liang-Chi Hsieh commented on SPARK-23519: - Thanks for pinging me. I am going on

[jira] [Commented] (SPARK-28672) [UDF] Duplicate function creation should not allow

2019-08-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911007#comment-16911007 ] Liang-Chi Hsieh commented on SPARK-28672: - Is there any rule in Hive regarding t

[jira] [Commented] (SPARK-28761) spark.driver.maxResultSize only applies to compressed data

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909420#comment-16909420 ] Liang-Chi Hsieh commented on SPARK-28761: - If you do it at SparkPlan.scala#L344,

[jira] [Commented] (SPARK-28732) org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' when storing

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909409#comment-16909409 ] Liang-Chi Hsieh commented on SPARK-28732: - As {{count}} return type is LongType,

[jira] [Comment Edited] (SPARK-28732) org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' when st

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909409#comment-16909409 ] Liang-Chi Hsieh edited comment on SPARK-28732 at 8/16/19 9:19 PM:

[jira] [Created] (SPARK-28722) Change sequential label sorting in StringIndexer fit to parallel

2019-08-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28722: --- Summary: Change sequential label sorting in StringIndexer fit to parallel Key: SPARK-28722 URL: https://issues.apache.org/jira/browse/SPARK-28722 Project: Spark

[jira] [Updated] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28652: Priority: Minor (was: Major) > spark.kubernetes.pyspark.pythonVersion is never passed to

[jira] [Updated] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28652: Issue Type: Test (was: Bug) > spark.kubernetes.pyspark.pythonVersion is never passed to e

[jira] [Commented] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904743#comment-16904743 ] Liang-Chi Hsieh commented on SPARK-28652: - As existing tests don't explicitly ch

[jira] [Commented] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904729#comment-16904729 ] Liang-Chi Hsieh commented on SPARK-28652: - This looks interesting to me. I tried

[jira] [Commented] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause

2019-08-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899788#comment-16899788 ] Liang-Chi Hsieh commented on SPARK-28422: - Thanks [~dongjoon]! > GROUPED_AGG p

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889742#comment-16889742 ] Liang-Chi Hsieh commented on SPARK-24152: - Ok. I think it was fixed. > SparkR C

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889671#comment-16889671 ] Liang-Chi Hsieh commented on SPARK-24152: - This CRAN issue is happening now, aga

[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes

2019-07-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28441: Summary: PythonUDF used in correlated scalar subquery causes (was: udf(max(udf(column)))

[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes UnsupportedOperationException

2019-07-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28441: Summary: PythonUDF used in correlated scalar subquery causes UnsupportedOperationException

[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes UnsupportedOperationException

2019-07-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28441: Priority: Major (was: Minor) > PythonUDF used in correlated scalar subquery causes > Uns

[jira] [Commented] (SPARK-28288) Convert and port 'window.sql' into UDF test base

2019-07-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887707#comment-16887707 ] Liang-Chi Hsieh commented on SPARK-28288: - Those errors can be found in original

[jira] [Updated] (SPARK-28365) Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Summary: Fallback locale to en_US in StopWordsRemover if system default locale isn't in av

[jira] [Updated] (SPARK-28365) Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Summary: Set default locale param for StopWordsRemover to en_US if system default locale i

[jira] [Updated] (SPARK-28365) Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Priority: Major (was: Minor) > Set default locale param for StopWordsRemover to en_US if

[jira] [Updated] (SPARK-28365) Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Component/s: (was: PySpark) > Set default locale param for StopWordsRemover to en_US i

[jira] [Updated] (SPARK-28365) Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Issue Type: Bug (was: Test) > Set default locale param for StopWordsRemover to en_US if s

[jira] [Updated] (SPARK-28365) Set default locale for StopWordsRemover tests to prevent invalid locale error during test

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Component/s: (was: Tests) ML > Set default locale for StopWordsRemove

[jira] [Updated] (SPARK-28365) Set default locale for StopWordsRemover tests to prevent invalid locale error during test

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Description: Because the local default locale isn't in available locales at {{Locale}}, wh

[jira] [Created] (SPARK-28381) Upgraded version of Pyrolite to 4.30

2019-07-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28381: --- Summary: Upgraded version of Pyrolite to 4.30 Key: SPARK-28381 URL: https://issues.apache.org/jira/browse/SPARK-28381 Project: Spark Issue Type: Improv

[jira] [Created] (SPARK-28378) Remove usage of cgi.escape

2019-07-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28378: --- Summary: Remove usage of cgi.escape Key: SPARK-28378 URL: https://issues.apache.org/jira/browse/SPARK-28378 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-28365) Set default locale for StopWordsRemover tests to prevent invalid locale error during test

2019-07-12 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28365: --- Summary: Set default locale for StopWordsRemover tests to prevent invalid locale error during test Key: SPARK-28365 URL: https://issues.apache.org/jira/browse/SPARK-28365

[jira] [Commented] (SPARK-28345) PythonUDF predicate should be able to pushdown to join

2019-07-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882712#comment-16882712 ] Liang-Chi Hsieh commented on SPARK-28345: - I found this when doing SPARK-28276.

[jira] [Created] (SPARK-28345) PythonUDF predicate should be able to pushdown to join

2019-07-10 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28345: --- Summary: PythonUDF predicate should be able to pushdown to join Key: SPARK-28345 URL: https://issues.apache.org/jira/browse/SPARK-28345 Project: Spark

[jira] [Commented] (SPARK-28323) PythonUDF should be able to use in join condition

2019-07-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881717#comment-16881717 ] Liang-Chi Hsieh commented on SPARK-28323: - I found this bug when doing SPARK-282

[jira] [Created] (SPARK-28323) PythonUDF should be able to use in join condition

2019-07-09 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28323: --- Summary: PythonUDF should be able to use in join condition Key: SPARK-28323 URL: https://issues.apache.org/jira/browse/SPARK-28323 Project: Spark Issue

[jira] [Commented] (SPARK-28276) Convert and port 'cross-join.sql' into UDF test base

2019-07-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881033#comment-16881033 ] Liang-Chi Hsieh commented on SPARK-28276: - Will look into this. > Convert and p

[jira] [Comment Edited] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877497#comment-16877497 ] Liang-Chi Hsieh edited comment on SPARK-24152 at 7/3/19 5:19 AM: -

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877497#comment-16877497 ] Liang-Chi Hsieh commented on SPARK-24152: - Received reply that is cleaned up. >

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877401#comment-16877401 ] Liang-Chi Hsieh commented on SPARK-24152: - I noticed that this issue happens now

[jira] [Created] (SPARK-28215) as_tibble was removed from Arrow R API

2019-06-29 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28215: --- Summary: as_tibble was removed from Arrow R API Key: SPARK-28215 URL: https://issues.apache.org/jira/browse/SPARK-28215 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-22340) pyspark setJobGroup doesn't match java threads

2019-06-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872852#comment-16872852 ] Liang-Chi Hsieh commented on SPARK-22340: - [~hyukjin.kwon] Should we reopen this

[jira] [Commented] (SPARK-28079) CSV fails to detect corrupt record unless "columnNameOfCorruptRecord" is manually added to the schema

2019-06-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868577#comment-16868577 ] Liang-Chi Hsieh commented on SPARK-28079: - {{columnNameOfCorruptRecord}} current

[jira] [Commented] (SPARK-27946) Hive DDL to Spark DDL conversion USING "show create table"

2019-06-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867771#comment-16867771 ] Liang-Chi Hsieh commented on SPARK-27946: - [~smilegator] Thanks for pinging me.

[jira] [Commented] (SPARK-28079) CSV fails to detect corrupt record unless "columnNameOfCorruptRecord" is manually added to the schema

2019-06-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866777#comment-16866777 ] Liang-Chi Hsieh commented on SPARK-28079: - Isn't it the expected behavior as doc

[jira] [Comment Edited] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865714#comment-16865714 ] Liang-Chi Hsieh edited comment on SPARK-28058 at 6/17/19 3:59 PM:

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865714#comment-16865714 ] Liang-Chi Hsieh commented on SPARK-28058: - [~hyukjin.kwon] Do you mean this is s

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865695#comment-16865695 ] Liang-Chi Hsieh commented on SPARK-28058: - [~stwhit] Thanks for letting us know

[jira] [Created] (SPARK-28082) Add a note to DROPMALFORMED mode of CSV for column pruning

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28082: --- Summary: Add a note to DROPMALFORMED mode of CSV for column pruning Key: SPARK-28082 URL: https://issues.apache.org/jira/browse/SPARK-28082 Project: Spark

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865664#comment-16865664 ] Liang-Chi Hsieh commented on SPARK-28058: - Although this isn't a bug, I think it

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865650#comment-16865650 ] Liang-Chi Hsieh commented on SPARK-28058: - This is due to CSV parser column prun

[jira] [Commented] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865006#comment-16865006 ] Liang-Chi Hsieh commented on SPARK-28054: - I tested on Hive, the query works. Bt

[jira] [Commented] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864189#comment-16864189 ] Liang-Chi Hsieh commented on SPARK-28054: - Is this query working on Hive? > Una

[jira] [Commented] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864107#comment-16864107 ] Liang-Chi Hsieh commented on SPARK-28043: - To make duplicate JSON keys work, I t

[jira] [Commented] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863680#comment-16863680 ] Liang-Chi Hsieh commented on SPARK-28043: - I tried to look around that, like ht

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863180#comment-16863180 ] Liang-Chi Hsieh commented on SPARK-28006: - I'm curious about two questions: Can

[jira] [Commented] (SPARK-27966) input_file_name empty when listing files in parallel

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863030#comment-16863030 ] Liang-Chi Hsieh commented on SPARK-27966: - I can't see where input_file_name is,

  1   2   3   4   5   6   7   8   9   10   >