[jira] [Commented] (SPARK-25732) Allow specifying a keytab/principal for proxy user for token renewal

2018-10-15 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649917#comment-16649917 ] Marco Gaido commented on SPARK-25732: - cc [~vanzin] [~tgraves] [~jerryshao] [~mridul]. Sorry for

[jira] [Commented] (SPARK-25728) SPIP: Structured Intermediate Representation (Tungsten IR) for generating Java code

2018-10-16 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651480#comment-16651480 ] Marco Gaido commented on SPARK-25728: - Thanks [~kiszk]. I will check it ASAP, thanks for your work

[jira] [Commented] (SPARK-25767) Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library

2018-10-18 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655366#comment-16655366 ] Marco Gaido commented on SPARK-25767: - I tried on current master branch but I wasn't able to

[jira] [Commented] (SPARK-25767) Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library

2018-10-18 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655440#comment-16655440 ] Marco Gaido commented on SPARK-25767: - So I tracked down the issue. The problem is that you are

[jira] [Commented] (SPARK-25767) Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library

2018-10-18 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655530#comment-16655530 ] Marco Gaido commented on SPARK-25767: - Your conversion of a Java array in a Scala Seq creates a

[jira] [Commented] (SPARK-25767) Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library

2018-10-18 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655418#comment-16655418 ] Marco Gaido commented on SPARK-25767: - It is interesting, I can reproduce with the Java API but not

[jira] [Commented] (SPARK-25767) Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library

2018-10-19 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656861#comment-16656861 ] Marco Gaido commented on SPARK-25767: - I think it is a bug (thanks for reporting this): indeed I

[jira] [Created] (SPARK-25764) Avoid usage of deprecated methods in examples for BisectingKMeans

2018-10-18 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-25764: --- Summary: Avoid usage of deprecated methods in examples for BisectingKMeans Key: SPARK-25764 URL: https://issues.apache.org/jira/browse/SPARK-25764 Project: Spark

[jira] [Created] (SPARK-25765) Add trainingCost to BisectingKMeans summary

2018-10-18 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-25765: --- Summary: Add trainingCost to BisectingKMeans summary Key: SPARK-25765 URL: https://issues.apache.org/jira/browse/SPARK-25765 Project: Spark Issue Type:

[jira] [Commented] (SPARK-25691) Analyzer rule "AliasViewChild" does not stabilize

2018-10-13 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649035#comment-16649035 ] Marco Gaido commented on SPARK-25691: - I think this is actually an instance of a bigger problem,

[jira] [Commented] (SPARK-25732) Allow specifying a keytab/principal for proxy user for token renewal

2018-10-16 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651860#comment-16651860 ] Marco Gaido commented on SPARK-25732: - [~tgraves] yes, exactly it is what I am referring as

[jira] [Commented] (SPARK-25732) Allow specifying a keytab/principal for proxy user for token renewal

2018-10-16 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651800#comment-16651800 ] Marco Gaido commented on SPARK-25732: - [~tgraves] I think they can be reused, the point is that it

[jira] [Created] (SPARK-25758) Deprecate BisectingKMeans compute cost

2018-10-17 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-25758: --- Summary: Deprecate BisectingKMeans compute cost Key: SPARK-25758 URL: https://issues.apache.org/jira/browse/SPARK-25758 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-25758) Deprecate BisectingKMeans compute cost

2018-10-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653641#comment-16653641 ] Marco Gaido commented on SPARK-25758: - cc [~cloud_fan] [~srowen] [~holdenkarau]. This is a minor

[jira] [Created] (SPARK-25867) Remove KMeans computeCost

2018-10-29 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-25867: --- Summary: Remove KMeans computeCost Key: SPARK-25867 URL: https://issues.apache.org/jira/browse/SPARK-25867 Project: Spark Issue Type: Task

[jira] [Updated] (SPARK-25866) Update KMeans formatVersion

2018-10-29 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-25866: Priority: Minor (was: Major) > Update KMeans formatVersion > --- > >

[jira] [Updated] (SPARK-25866) Update KMeans formatVersion

2018-10-29 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-25866: Issue Type: Bug (was: Task) > Update KMeans formatVersion > --- > >

[jira] [Created] (SPARK-25866) Update KMeans formatVersion

2018-10-29 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-25866: --- Summary: Update KMeans formatVersion Key: SPARK-25866 URL: https://issues.apache.org/jira/browse/SPARK-25866 Project: Spark Issue Type: Task

[jira] [Commented] (SPARK-25863) java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala

2018-10-29 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1876#comment-1876 ] Marco Gaido commented on SPARK-25863: - [~Tagar] thanks for reporting this. May you please provide a

[jira] [Commented] (SPARK-25441) calculate term frequency in CountVectorizer()

2018-10-30 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668654#comment-16668654 ] Marco Gaido commented on SPARK-25441: - TF has an appropriate transformer. I think this can be closed

[jira] [Commented] (SPARK-25870) RandomSplit with seed gives different results depending on column order

2018-10-30 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669037#comment-16669037 ] Marco Gaido commented on SPARK-25870: - Thanks [~deacuna]. > RandomSplit with seed gives different

[jira] [Commented] (SPARK-25863) java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala

2018-10-30 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668438#comment-16668438 ] Marco Gaido commented on SPARK-25863: - [~Tagar] thanks. ??not sure yet as it might depend on data I

[jira] [Commented] (SPARK-25870) RandomSplit with seed gives different results depending on column order

2018-10-30 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668313#comment-16668313 ] Marco Gaido commented on SPARK-25870: - If you do some transformations (simple or complex doesn't

[jira] [Created] (SPARK-25838) Remove formatVersion from Saveable

2018-10-25 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-25838: --- Summary: Remove formatVersion from Saveable Key: SPARK-25838 URL: https://issues.apache.org/jira/browse/SPARK-25838 Project: Spark Issue Type: Task

[jira] [Commented] (SPARK-25829) Duplicated map keys are not handled consistently

2018-10-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663553#comment-16663553 ] Marco Gaido commented on SPARK-25829: - I think the main issue is that since this is not a SQL

[jira] [Commented] (SPARK-24437) Memory leak in UnsafeHashedRelation

2018-11-05 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674827#comment-16674827 ] Marco Gaido commented on SPARK-24437: - Hi [~dvogelbacher], thanks for you comment and your analysis.

[jira] [Issue Comment Deleted] (SPARK-25650) Make analyzer rules used in once-policy idempotent

2018-11-05 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-25650: Comment: was deleted (was: [~maryannxue] since all the subtasks are completed, shall we close

[jira] [Commented] (SPARK-25650) Make analyzer rules used in once-policy idempotent

2018-11-05 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674955#comment-16674955 ] Marco Gaido commented on SPARK-25650: - [~maryannxue] since all the subtasks are completed, shall we

[jira] [Commented] (SPARK-25650) Make analyzer rules used in once-policy idempotent

2018-11-05 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674954#comment-16674954 ] Marco Gaido commented on SPARK-25650: - [~maryannxue] since all the subtasks are completed, shall we

[jira] [Commented] (SPARK-25870) RandomSplit with seed gives different results depending on column order

2018-10-29 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667346#comment-16667346 ] Marco Gaido commented on SPARK-25870: - Why do you consider this a bug? They are 2 different

[jira] [Commented] (SPARK-24437) Memory leak in UnsafeHashedRelation

2018-11-05 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675353#comment-16675353 ] Marco Gaido commented on SPARK-24437: - [~eyalfa] yes, that is the point, if there is a node failure

[jira] [Resolved] (SPARK-25996) Agregaciones no retornan los valores correctos con rows con timestamps iguales

2018-11-10 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-25996. - Resolution: Not A Problem [~igomezraggio] check the ts of the first row. it is {{00:00:01}}, so

[jira] [Commented] (SPARK-25332) Instead of broadcast hash join ,Sort merge join has selected when restart spark-shell/spark-JDBC for hive provider

2018-11-10 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682329#comment-16682329 ] Marco Gaido commented on SPARK-25332: - [~Bjangir] please don't use "Critical" and "Blocker": they

[jira] [Updated] (SPARK-25332) Instead of broadcast hash join ,Sort merge join has selected when restart spark-shell/spark-JDBC for hive provider

2018-11-10 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-25332: Priority: Major (was: Critical) > Instead of broadcast hash join ,Sort merge join has selected

[jira] [Commented] (SPARK-24437) Memory leak in UnsafeHashedRelation

2018-11-08 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679745#comment-16679745 ] Marco Gaido commented on SPARK-24437: - [~dvogelbacher] the point is: a broadcast is never

[jira] [Created] (SPARK-26003) Improve performance in SQLAppStatusListener

2018-11-10 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-26003: --- Summary: Improve performance in SQLAppStatusListener Key: SPARK-26003 URL: https://issues.apache.org/jira/browse/SPARK-26003 Project: Spark Issue Type:

[jira] [Commented] (SPARK-26024) Dataset API: repartitionByRange(...) has inconsistent behaviour

2018-11-13 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685192#comment-16685192 ] Marco Gaido commented on SPARK-26024: - I am not sure about that [~JulienPeloton]. In general we

[jira] [Commented] (SPARK-26024) Dataset API: repartitionByRange(...) has inconsistent behaviour

2018-11-13 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685131#comment-16685131 ] Marco Gaido commented on SPARK-26024: - I think this is the expected behavior, as Spark samples

[jira] [Commented] (SPARK-26045) Error in the spark 2.4 release package with the spark-avro_2.11 depdency

2018-11-16 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689217#comment-16689217 ] Marco Gaido commented on SPARK-26045: - [~o.garcia] can you please create a PR for this? > Error in

[jira] [Commented] (SPARK-26078) WHERE .. IN fails to filter rows when used in combination with UNION

2018-11-16 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689262#comment-16689262 ] Marco Gaido commented on SPARK-26078: - I'll investigate this immediately, thanks [~cloud_fan]. >

[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686293#comment-16686293 ] Marco Gaido commented on SPARK-26054: - I cannot reproduce this: {code} val df = Seq(AA("0101",

[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686333#comment-16686333 ] Marco Gaido commented on SPARK-26054: - {code} val data = Seq(AA("0101", "2500.98".toDouble),

[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686395#comment-16686395 ] Marco Gaido commented on SPARK-26054: - Then the affected version is 2.2.0, not 2.4.0. I am updating

[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686378#comment-16686378 ] Marco Gaido commented on SPARK-26054: - Yes, sorry, I forgot to copy its definition. It is: {code}

[jira] [Updated] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-26054: Component/s: (was: Spark Core) SQL > Creating a computed column applying the

[jira] [Resolved] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-26054. - Resolution: Cannot Reproduce > Creating a computed column applying the spark sql rounding on a

[jira] [Updated] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-26054: Affects Version/s: (was: 2.4.0) 2.2.0 > Creating a computed column

[jira] [Resolved] (SPARK-26018) Support Scalar subqueries in predicate push down to datasources

2018-11-15 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-26018. - Resolution: Won't Fix This may be very hard to do as we now add filters to datasources from

[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute

2018-11-15 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687951#comment-16687951 ] Marco Gaido commented on SPARK-26041: - [~Tagar] I don't have you table definitions so I cannot run

[jira] [Commented] (SPARK-26063) CatalystDataToAvro gives "UnresolvedException: Invalid call to dataType on unresolved object" when requested for numberedTreeString

2018-11-15 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687971#comment-16687971 ] Marco Gaido commented on SPARK-26063: - I think this was fixed in SPARK-25883. But we may want to

[jira] [Commented] (SPARK-26018) Support Scalar subqueries in predicate push down to datasources

2018-11-12 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684073#comment-16684073 ] Marco Gaido commented on SPARK-26018: - I'll submit a PR for this once

[jira] [Created] (SPARK-26018) Support Scalar subqueries in predicate push down to datasources

2018-11-12 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-26018: --- Summary: Support Scalar subqueries in predicate push down to datasources Key: SPARK-26018 URL: https://issues.apache.org/jira/browse/SPARK-26018 Project: Spark

[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686828#comment-16686828 ] Marco Gaido commented on SPARK-26041: - I think this may be a duplicate of SPARK-26057 (or the other

[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686857#comment-16686857 ] Marco Gaido commented on SPARK-26041: - Then it'd help if you could provide a reproducer for this...

[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute

2018-11-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686870#comment-16686870 ] Marco Gaido commented on SPARK-26041: - No, it is not, for 2.3 we would need a dedicated fix. >

[jira] [Commented] (SPARK-25648) Spark 2.3.1 reads orc format files with native and hive, and return different results

2018-10-05 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639723#comment-16639723 ] Marco Gaido commented on SPARK-25648: - cc [~dongjoon] > Spark 2.3.1 reads orc format files with

[jira] [Commented] (SPARK-25597) SQL query with limit iterates the whole iterator when WholeStage code generation is enabled

2018-10-02 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635430#comment-16635430 ] Marco Gaido commented on SPARK-25597: - I think this is a duplicate of SPARK-25497. [~hkroger] may

[jira] [Resolved] (SPARK-25686) date_trunc Spark SQL function silently returns null if parameters are swapped

2018-10-11 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-25686. - Resolution: Duplicate > date_trunc Spark SQL function silently returns null if parameters are

[jira] [Commented] (SPARK-25686) date_trunc Spark SQL function silently returns null if parameters are swapped

2018-10-11 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646075#comment-16646075 ] Marco Gaido commented on SPARK-25686: - I am closing this as it is a duplicate of SPARK-24378. Thanks

[jira] [Commented] (SPARK-25686) date_trunc Spark SQL function silently returns null if parameters are swapped

2018-10-11 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646288#comment-16646288 ] Marco Gaido commented on SPARK-25686: - I don't think this is going to happen. Most of the functions

[jira] [Commented] (SPARK-25582) Error in Spark logs when using the org.apache.spark:spark-sql_2.11:2.2.0 Java library

2018-10-01 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634389#comment-16634389 ] Marco Gaido commented on SPARK-25582: - Sorry, I linked the wrong JIRA in the PR. Please disregard

[jira] [Commented] (SPARK-25538) incorrect row counts after distinct()

2018-10-01 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634106#comment-16634106 ] Marco Gaido commented on SPARK-25538: - I was able to reproduce also using limit instead of sort:

[jira] [Commented] (SPARK-25582) Error in Spark logs when using the org.apache.spark:spark-sql_2.11:2.2.0 Java library

2018-10-03 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636616#comment-16636616 ] Marco Gaido commented on SPARK-25582: - Hi [~onyssius]. Sorry for the trouble, it shows as "In

[jira] [Updated] (SPARK-25457) IntegralDivide (div) should not always return long

2018-09-20 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-25457: Description: The operation {{div}} returns always long. This came from Hive's behavior, which is

[jira] [Commented] (SPARK-24440) When use constant as column we may get wrong answer versus impala

2018-09-26 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628608#comment-16628608 ] Marco Gaido commented on SPARK-24440: - Can you provide a sample repro which can be run in order to

[jira] [Updated] (SPARK-25538) incorrect row counts after distinct()

2018-09-26 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-25538: Priority: Major (was: Blocker) > incorrect row counts after distinct() >

[jira] [Commented] (SPARK-25538) incorrect row counts after distinct()

2018-09-26 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628369#comment-16628369 ] Marco Gaido commented on SPARK-25538: - Please do not use Blocker and Critical when reporting issues

[jira] [Updated] (SPARK-25538) incorrect row counts after distinct()

2018-09-26 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-25538: Labels: correctness (was: ) > incorrect row counts after distinct() >

[jira] [Created] (SPARK-25551) Remove unused InSubquery expression

2018-09-27 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-25551: --- Summary: Remove unused InSubquery expression Key: SPARK-25551 URL: https://issues.apache.org/jira/browse/SPARK-25551 Project: Spark Issue Type: Task

[jira] [Commented] (SPARK-26450) Map of schema is built too frequently in some wide queries

2018-12-27 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729675#comment-16729675 ] Marco Gaido commented on SPARK-26450: - Great, thanks! > Map of schema is built too frequently in

[jira] [Resolved] (SPARK-26515) SQL date_format function for 2018-12-30 returns 2019 date

2019-01-01 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-26515. - Resolution: Not A Bug The problem is in your logic, namely in the format. You need {{}},

[jira] [Commented] (SPARK-26491) Use ConfigEntry for hardcoded configs for test categories.

2018-12-30 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730956#comment-16730956 ] Marco Gaido commented on SPARK-26491: - I am working on this, thanks. > Use ConfigEntry for

[jira] [Commented] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744799#comment-16744799 ] Marco Gaido commented on SPARK-26639: - I see, then let me investigate this further.I think I already

[jira] [Commented] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744783#comment-16744783 ] Marco Gaido commented on SPARK-26639: - This may be a duplicate of SPARK-25482. Please may you try on

[jira] [Commented] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-20 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747501#comment-16747501 ] Marco Gaido commented on SPARK-26639: - [~Jk_Self] I checked and there is only one subquery node in

[jira] [Commented] (SPARK-26569) Fixed point for batch Operator Optimizations never reached when optimize logicalPlan

2019-01-14 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742305#comment-16742305 ] Marco Gaido commented on SPARK-26569: - [~chenfan] may you please try a more recent version of Spark?

[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)

2019-01-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745146#comment-16745146 ] Marco Gaido commented on SPARK-26645: - The error is on python side, I will submit a PR shortly,

[jira] [Commented] (SPARK-20162) Reading data from MySQL - Cannot up cast from decimal(30,6) to decimal(38,18)

2019-01-23 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749920#comment-16749920 ] Marco Gaido commented on SPARK-20162: - [~bonazzaf] what you just reported is an invalid use case and

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-12-12 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719247#comment-16719247 ] Marco Gaido commented on SPARK-24152: - [~viirya] [~hyukjin.kwon] I am seeing this again constantly:

[jira] [Commented] (SPARK-26280) Spark will read entire CSV file even when limit is used

2018-12-13 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1672#comment-1672 ] Marco Gaido commented on SPARK-26280: - I'd say this is most likely a duplicate of

[jira] [Commented] (SPARK-26336) left_anti join with Na Values

2018-12-18 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724052#comment-16724052 ] Marco Gaido commented on SPARK-26336: - That's correct because NULLs do not match. The usual

[jira] [Commented] (SPARK-26336) left_anti join with Na Values

2018-12-18 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724098#comment-16724098 ] Marco Gaido commented on SPARK-26336: - [~csevilla] the point is always the same, ie. the presence of

[jira] [Commented] (SPARK-26437) Decimal data becomes bigint to query, unable to query

2018-12-27 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729526#comment-16729526 ] Marco Gaido commented on SPARK-26437: - cc [~dongjoon] > Decimal data becomes bigint to query,

[jira] [Commented] (SPARK-26450) Map of schema is built too frequently in some wide queries

2018-12-27 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729621#comment-16729621 ] Marco Gaido commented on SPARK-26450: - Thanks for this JIRA [~bersprockets]. This makes sense to me.

[jira] [Commented] (SPARK-26339) Behavior of reading files that start with underscore is confusing

2018-12-11 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717168#comment-16717168 ] Marco Gaido commented on SPARK-26339: - The point is: files starting with underscores are hidden

[jira] [Commented] (SPARK-26214) Add "broadcast" method to DataFrame

2018-11-30 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704958#comment-16704958 ] Marco Gaido commented on SPARK-26214: - I don't think it is really the same. I don't think it is a

[jira] [Commented] (SPARK-24498) Add JDK compiler for runtime codegen

2018-11-29 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702889#comment-16702889 ] Marco Gaido commented on SPARK-24498: - +1 for closing this. > Add JDK compiler for runtime codegen

[jira] [Resolved] (SPARK-26231) Dataframes inner join on double datatype columns resulting in Cartesian product

2018-12-04 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-26231. - Resolution: Duplicate [~yumwang] you're right, but this is a duplicate and there is already a

[jira] [Commented] (SPARK-26179) `map_concat` should replace the value in the left side

2018-12-04 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708541#comment-16708541 ] Marco Gaido commented on SPARK-26179: - I think this was resolved by SPARK-25829. [~dbtsai] Shall we

[jira] [Commented] (SPARK-26270) Having clause does not work with explode anymore

2018-12-05 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709953#comment-16709953 ] Marco Gaido commented on SPARK-26270: - This is caused by SPARK-25708. You can find more details on

[jira] [Resolved] (SPARK-26270) Having clause does not work with explode anymore

2018-12-05 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-26270. - Resolution: Invalid > Having clause does not work with explode anymore >

[jira] [Resolved] (SPARK-26242) Leading slash breaks proxying

2018-12-02 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-26242. - Resolution: Not A Problem > Leading slash breaks proxying > - > >

[jira] [Commented] (SPARK-26242) Leading slash breaks proxying

2018-12-02 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706157#comment-16706157 ] Marco Gaido commented on SPARK-26242: - Let me close this. Please reopen only if you find issues. In

[jira] [Commented] (SPARK-26233) Incorrect decimal value with java beans and first/last/max... functions

2018-12-03 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707009#comment-16707009 ] Marco Gaido commented on SPARK-26233: - I think this is related to SPARK-24957. The point is that in

[jira] [Commented] (SPARK-26233) Incorrect decimal value with java beans and first/last/max... functions

2018-12-03 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707729#comment-16707729 ] Marco Gaido commented on SPARK-26233: - [~dongjoon] I think so. SPARK-24957 was a long standing issue

[jira] [Commented] (SPARK-18147) Broken Spark SQL Codegen

2018-12-06 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711277#comment-16711277 ] Marco Gaido commented on SPARK-18147: - [~chamcyl] could you try 2.3.2 please? If it fails on 2.3.2,

[jira] [Commented] (SPARK-26308) Large BigDecimal value is converted to null when passed into a UDF

2018-12-10 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714483#comment-16714483 ] Marco Gaido commented on SPARK-26308: - Thanks for pinging me [~dongjoon], I'll take a look at this

[jira] [Commented] (SPARK-26308) Large BigDecimal value is converted to null when passed into a UDF

2018-12-10 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714557#comment-16714557 ] Marco Gaido commented on SPARK-26308: - So the problem here is that the type inferred for decimal

[jira] [Commented] (SPARK-26308) Large BigDecimal value is converted to null when passed into a UDF

2018-12-10 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714597#comment-16714597 ] Marco Gaido commented on SPARK-26308: - [~cloud_fan] what do you mean by a general {{DecimalType}}? A

[jira] [Commented] (SPARK-26308) Large BigDecimal value is converted to null when passed into a UDF

2018-12-10 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714702#comment-16714702 ] Marco Gaido commented on SPARK-26308: - Yes, but it is an {{AbstractDataType}}, not a {{DataType}}.

<    1   2   3   4   5   6   7   >