[jira] [Commented] (SPARK-24927) The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556975#comment-16556975 ] Xiao Li commented on SPARK-24927: - cc [~jerryshao] > The hadoop-provided profile doesn

[jira] [Created] (SPARK-24927) The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files

2018-07-25 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-24927: -- Summary: The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files Key: SPARK-24927 URL: https://issues.apache.org/jira/browse/SPARK-24927 Projec

[jira] [Commented] (SPARK-24926) Ensure numCores is used consistently in all netty configuration (driver and executors)

2018-07-25 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556965#comment-16556965 ] Imran Rashid commented on SPARK-24926: -- I was talking to [~nsheth] about this, he's

[jira] [Created] (SPARK-24926) Ensure numCores is used consistently in all netty configuration (driver and executors)

2018-07-25 Thread Imran Rashid (JIRA)
Imran Rashid created SPARK-24926: Summary: Ensure numCores is used consistently in all netty configuration (driver and executors) Key: SPARK-24926 URL: https://issues.apache.org/jira/browse/SPARK-24926

[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-07-25 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556964#comment-16556964 ] Imran Rashid commented on SPARK-24918: -- I have some changes with an initial draft o

[jira] [Commented] (SPARK-23128) A new approach to do adaptive execution in Spark SQL

2018-07-25 Thread Carson Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556961#comment-16556961 ] Carson Wang commented on SPARK-23128: - Thanks [~tgraves] very much. I'll follow this

[jira] [Commented] (SPARK-24882) separate responsibilities of the data source v2 read API

2018-07-25 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556896#comment-16556896 ] Ryan Blue commented on SPARK-24882: --- Sounds fine, but it's getting close and I wouldn'

[jira] [Commented] (SPARK-24882) separate responsibilities of the data source v2 read API

2018-07-25 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556885#comment-16556885 ] Wenchen Fan commented on SPARK-24882: - We don't need to rush for 2.4, but would be g

[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Execution Mode in Apache Spark

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24374: Affects Version/s: (was: 3.0.0) 2.4.0 > SPIP: Support Barrier Execution Mode in

[jira] [Comment Edited] (SPARK-24630) SPIP: Support SQLStreaming in Spark

2018-07-25 Thread Genmao Yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556837#comment-16556837 ] Genmao Yu edited comment on SPARK-24630 at 7/26/18 3:24 AM:

[jira] [Commented] (SPARK-24630) SPIP: Support SQLStreaming in Spark

2018-07-25 Thread Genmao Yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556837#comment-16556837 ] Genmao Yu commented on SPARK-24630: --- [~zsxwing] Is there plan to better support SQL on

[jira] [Commented] (SPARK-24921) SparkStreaming steadily increasing job generation delay due to apparent URLClassLoader contention

2018-07-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556826#comment-16556826 ] Hyukjin Kwon commented on SPARK-24921: -- [~tommyshiou], is this rather a question? I

[jira] [Commented] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556824#comment-16556824 ] Hyukjin Kwon commented on SPARK-24914: -- cc [~ZenWzh] > totalSize is not a good est

[jira] [Commented] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556820#comment-16556820 ] yucai commented on SPARK-24925: --- [~cloud_fan], [~xiaoli] , [~kiszk] , any comments? > inp

[jira] [Assigned] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24925: Assignee: Apache Spark > input bytesRead metrics fluctuate from time to time > --

[jira] [Commented] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556819#comment-16556819 ] Apache Spark commented on SPARK-24925: -- User 'yucai' has created a pull request for

[jira] [Assigned] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24925: Assignee: (was: Apache Spark) > input bytesRead metrics fluctuate from time to time >

[jira] [Comment Edited] (SPARK-24288) Enable preventing predicate pushdown

2018-07-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556801#comment-16556801 ] Hyukjin Kwon edited comment on SPARK-24288 at 7/26/18 2:56 AM: ---

[jira] [Commented] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556818#comment-16556818 ] yucai commented on SPARK-24925: --- I think there could be two issues. In FileScanRDD 1. Col

[jira] [Updated] (SPARK-24905) Spark 2.3 Internal URL env variable

2018-07-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24905: - Priority: Major (was: Critical) > Spark 2.3 Internal URL env variable > ---

[jira] [Commented] (SPARK-24905) Spark 2.3 Internal URL env variable

2018-07-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556817#comment-16556817 ] Hyukjin Kwon commented on SPARK-24905: -- (please avoid to set Critical+ which is usu

[jira] [Commented] (SPARK-24897) DAGScheduler should not unregisterMapOutput and increaseEpoch repeatedly for stage fetchFailed

2018-07-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556806#comment-16556806 ] Hyukjin Kwon commented on SPARK-24897: -- I couldn't follow it too. > DAGScheduler s

[jira] [Commented] (SPARK-24288) Enable preventing predicate pushdown

2018-07-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556801#comment-16556801 ] Hyukjin Kwon commented on SPARK-24288: -- [~smilegator] should we resolve this {{Won'

[jira] [Updated] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24925: -- Attachment: bytesRead.gif > input bytesRead metrics fluctuate from time to time >

[jira] [Updated] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24925: -- Description: input bytesRead metrics fluctuate from time to time, it is worse when pushdown enabled. Query {

[jira] [Created] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
yucai created SPARK-24925: - Summary: input bytesRead metrics fluctuate from time to time Key: SPARK-24925 URL: https://issues.apache.org/jira/browse/SPARK-24925 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-24832) Improve inputMetrics's bytesRead update for ColumnarBatch

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24832: -- Summary: Improve inputMetrics's bytesRead update for ColumnarBatch (was: When pushdown enabled, input bytesRe

[jira] [Updated] (SPARK-24832) When pushdown enabled, input bytesRead metrics is easy to fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24832: -- Summary: When pushdown enabled, input bytesRead metrics is easy to fluctuate from time to time (was: Improve

[jira] [Commented] (SPARK-24867) Add AnalysisBarrier to DataFrameWriter

2018-07-25 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556760#comment-16556760 ] Saisai Shao commented on SPARK-24867: - I see, thanks! Please let me know when the JI

[jira] [Commented] (SPARK-12911) Cacheing a dataframe causes array comparisons to fail (in filter / where) after 1.6

2018-07-25 Thread David Vogelbacher (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556470#comment-16556470 ] David Vogelbacher commented on SPARK-12911: --- Hey [~hyukjin.kwon] [~sdicocco][~

[jira] [Resolved] (SPARK-24916) Fix type coercion for IN expression with subquery

2018-07-25 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-24916. - Resolution: Duplicate > Fix type coercion for IN expression with subquery >

[jira] [Commented] (SPARK-24867) Add AnalysisBarrier to DataFrameWriter

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556447#comment-16556447 ] Xiao Li commented on SPARK-24867: - [~jerryshao] This ticket was just resolved. [~lian ch

[jira] [Resolved] (SPARK-24867) Add AnalysisBarrier to DataFrameWriter

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24867. - Resolution: Fixed Fix Version/s: 2.3.2 > Add AnalysisBarrier to DataFrameWriter > --

[jira] [Assigned] (SPARK-24924) Add mapping for built-in Avro data source

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24924: Assignee: Apache Spark > Add mapping for built-in Avro data source >

[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556427#comment-16556427 ] Apache Spark commented on SPARK-24924: -- User 'dongjoon-hyun' has created a pull req

[jira] [Assigned] (SPARK-24924) Add mapping for built-in Avro data source

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24924: Assignee: (was: Apache Spark) > Add mapping for built-in Avro data source > -

[jira] [Created] (SPARK-24924) Add mapping for built-in Avro data source

2018-07-25 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-24924: - Summary: Add mapping for built-in Avro data source Key: SPARK-24924 URL: https://issues.apache.org/jira/browse/SPARK-24924 Project: Spark Issue Type: Sub-t

[jira] [Commented] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-25 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556413#comment-16556413 ] Jason Guo commented on SPARK-24906: --- [~maropu]  [~viirya]  What do you think about thi

[jira] [Commented] (SPARK-24923) DataSourceV2: Add CTAS and RTAS logical operations

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556383#comment-16556383 ] Apache Spark commented on SPARK-24923: -- User 'rdblue' has created a pull request fo

[jira] [Assigned] (SPARK-24923) DataSourceV2: Add CTAS and RTAS logical operations

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24923: Assignee: (was: Apache Spark) > DataSourceV2: Add CTAS and RTAS logical operations >

[jira] [Assigned] (SPARK-24923) DataSourceV2: Add CTAS and RTAS logical operations

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24923: Assignee: Apache Spark > DataSourceV2: Add CTAS and RTAS logical operations > ---

[jira] [Updated] (SPARK-24921) SparkStreaming steadily increasing job generation delay due to apparent URLClassLoader contention

2018-07-25 Thread Tommy S (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy S updated SPARK-24921: Component/s: Web UI > SparkStreaming steadily increasing job generation delay due to apparent > URLClassL

[jira] [Created] (SPARK-24923) DataSourceV2: Add CTAS and RTAS logical operations

2018-07-25 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-24923: - Summary: DataSourceV2: Add CTAS and RTAS logical operations Key: SPARK-24923 URL: https://issues.apache.org/jira/browse/SPARK-24923 Project: Spark Issue Type: Sub-

[jira] [Commented] (SPARK-24802) Optimization Rule Exclusion

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556310#comment-16556310 ] Apache Spark commented on SPARK-24802: -- User 'maryannxue' has created a pull reques

[jira] [Commented] (SPARK-1137) ZK Persistence Engine crashes if stored data has wrong serialVersionUID

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556251#comment-16556251 ] Apache Spark commented on SPARK-1137: - User 'aarondav' has created a pull request for

[jira] [Commented] (SPARK-24874) Allow hybrid of both barrier tasks and regular tasks in a stage

2018-07-25 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556241#comment-16556241 ] Reynold Xin commented on SPARK-24874: - Do we really need this? Seems like an uncommo

[jira] [Resolved] (SPARK-24860) Expose dynamic partition overwrite per write operation

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24860. - Resolution: Fixed Assignee: Koert Kuipers Fix Version/s: 2.4.0 > Expose dynamic partitio

[jira] [Resolved] (SPARK-23146) Support client mode for Kubernetes cluster backend

2018-07-25 Thread Matt Cheah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Cheah resolved SPARK-23146. Resolution: Fixed Fix Version/s: 2.4.0 > Support client mode for Kubernetes cluster backend

[jira] [Commented] (SPARK-24915) Calling SparkSession.createDataFrame with schema can throw exception

2018-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556126#comment-16556126 ] Bryan Cutler commented on SPARK-24915: -- Hi [~stspencer], I've been trying fix simil

[jira] [Commented] (SPARK-24288) Enable preventing predicate pushdown

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556119#comment-16556119 ] Apache Spark commented on SPARK-24288: -- User 'maryannxue' has created a pull reques

[jira] [Commented] (SPARK-23146) Support client mode for Kubernetes cluster backend

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556098#comment-16556098 ] Apache Spark commented on SPARK-23146: -- User 'mccheah' has created a pull request f

[jira] [Resolved] (SPARK-24849) Convert StructType to DDL string

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24849. - Resolution: Fixed Assignee: Maxim Gekk Fix Version/s: 2.4.0 > Convert StructType to DDL

[jira] [Resolved] (SPARK-24911) SHOW CREATE TABLE drops escaping of nested column names

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24911. - Resolution: Fixed Fix Version/s: 2.4.0 > SHOW CREATE TABLE drops escaping of nested column names

[jira] [Assigned] (SPARK-24911) SHOW CREATE TABLE drops escaping of nested column names

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-24911: --- Assignee: Maxim Gekk > SHOW CREATE TABLE drops escaping of nested column names > --

[jira] [Updated] (SPARK-24922) Iterative rdd union + reduceByKey operations on small dataset leads to "No space left on device" error on account of lot of shuffle spill.

2018-07-25 Thread Dinesh Dharme (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Dharme updated SPARK-24922: -- Description: I am trying to do few (union + reduceByKey) operations on a hiearchical dataset

[jira] [Created] (SPARK-24922) Iterative rdd union + reduceByKey operations on small dataset leads to "No space left on device" error on account of lot of shuffle spill.

2018-07-25 Thread Dinesh Dharme (JIRA)
Dinesh Dharme created SPARK-24922: - Summary: Iterative rdd union + reduceByKey operations on small dataset leads to "No space left on device" error on account of lot of shuffle spill. Key: SPARK-24922 URL: https:

[jira] [Created] (SPARK-24921) SparkStreaming steadily increasing job generation delay due to apparent URLClassLoader contention

2018-07-25 Thread Tommy S (JIRA)
Tommy S created SPARK-24921: --- Summary: SparkStreaming steadily increasing job generation delay due to apparent URLClassLoader contention Key: SPARK-24921 URL: https://issues.apache.org/jira/browse/SPARK-24921

[jira] [Commented] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555998#comment-16555998 ] Bruce Robbins commented on SPARK-24914: --- [~irashid] {quote} given HIVE-20079, can

[jira] [Updated] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24914: -- Description: When determining whether to do a broadcast join, Spark estimates the size of the

[jira] [Created] (SPARK-24920) Spark should share netty's memory pools across all uses

2018-07-25 Thread Imran Rashid (JIRA)
Imran Rashid created SPARK-24920: Summary: Spark should share netty's memory pools across all uses Key: SPARK-24920 URL: https://issues.apache.org/jira/browse/SPARK-24920 Project: Spark Issue

[jira] [Created] (SPARK-24919) Scala linter rule for sparkContext.hadoopConfiguration

2018-07-25 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-24919: -- Summary: Scala linter rule for sparkContext.hadoopConfiguration Key: SPARK-24919 URL: https://issues.apache.org/jira/browse/SPARK-24919 Project: Spark Is

[jira] [Assigned] (SPARK-24919) Scala linter rule for sparkContext.hadoopConfiguration

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24919: Assignee: (was: Apache Spark) > Scala linter rule for sparkContext.hadoopConfiguratio

[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-07-25 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555939#comment-16555939 ] Imran Rashid commented on SPARK-24918: -- [~jerryshao] [~tgraves] you might be intere

[jira] [Created] (SPARK-24918) Executor Plugin API

2018-07-25 Thread Imran Rashid (JIRA)
Imran Rashid created SPARK-24918: Summary: Executor Plugin API Key: SPARK-24918 URL: https://issues.apache.org/jira/browse/SPARK-24918 Project: Spark Issue Type: New Feature Compone

[jira] [Commented] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-25 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555877#comment-16555877 ] Imran Rashid commented on SPARK-24914: -- given HIVE-20079, can we also have a conf t

[jira] [Updated] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-07-25 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-24920: - Summary: Spark should allow sharing netty's memory pools across all uses (was: Spark should sha

[jira] [Assigned] (SPARK-24919) Scala linter rule for sparkContext.hadoopConfiguration

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24919: Assignee: Apache Spark > Scala linter rule for sparkContext.hadoopConfiguration > ---

[jira] [Commented] (SPARK-24919) Scala linter rule for sparkContext.hadoopConfiguration

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555972#comment-16555972 ] Apache Spark commented on SPARK-24919: -- User 'gengliangwang' has created a pull req

[jira] [Comment Edited] (SPARK-24904) Join with broadcasted dataframe causes shuffle of redundant data

2018-07-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555652#comment-16555652 ] Marco Gaido edited comment on SPARK-24904 at 7/25/18 1:28 PM:

[jira] [Commented] (SPARK-24904) Join with broadcasted dataframe causes shuffle of redundant data

2018-07-25 Thread Shay Elbaz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1678#comment-1678 ] Shay Elbaz commented on SPARK-24904: [~mgaido] Technically you *can* that, you just

[jira] [Updated] (SPARK-24904) Join with broadcasted dataframe causes shuffle of redundant data

2018-07-25 Thread Shay Elbaz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Elbaz updated SPARK-24904: --- Issue Type: Improvement (was: Question) > Join with broadcasted dataframe causes shuffle of redunda

[jira] [Updated] (SPARK-19018) spark csv writer charset support

2018-07-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-19018: Issue Type: Improvement (was: Bug) > spark csv writer charset support >

[jira] [Commented] (SPARK-24904) Join with broadcasted dataframe causes shuffle of redundant data

2018-07-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555842#comment-16555842 ] Marco Gaido commented on SPARK-24904: - [~shay_elbaz] In the case I mentioned before

[jira] [Updated] (SPARK-24917) Sending a partition over netty results in 2x memory usage

2018-07-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-24917: Description: Hello while investigating some OOM errors in Spark 2.2 [(here's my call stack)|https://imag

[jira] [Updated] (SPARK-24917) Sending a partition over netty results in 2x memory usage

2018-07-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-24917: Description: Hello while investigating some OOM errors in Spark 2.2 [(here's my call stack)|https://imag

[jira] [Created] (SPARK-24917) Sending a partition over netty results in 2x memory usage

2018-07-25 Thread Vincent (JIRA)
Vincent created SPARK-24917: --- Summary: Sending a partition over netty results in 2x memory usage Key: SPARK-24917 URL: https://issues.apache.org/jira/browse/SPARK-24917 Project: Spark Issue Type: I

[jira] [Commented] (SPARK-24904) Join with broadcasted dataframe causes shuffle of redundant data

2018-07-25 Thread Shay Elbaz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555769#comment-16555769 ] Shay Elbaz commented on SPARK-24904: [~mgaido] indeed this assumption is not always

[jira] [Commented] (SPARK-24904) Join with broadcasted dataframe causes shuffle of redundant data

2018-07-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555652#comment-16555652 ] Marco Gaido commented on SPARK-24904: - I see now what you mean, but yes, It think th

[jira] [Updated] (SPARK-24904) Join with broadcasted dataframe causes shuffle of redundant data

2018-07-25 Thread Shay Elbaz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Elbaz updated SPARK-24904: --- Description: When joining a "large" dataframe with broadcasted small one, and join-type is on the s

[jira] [Commented] (SPARK-24916) Fix type coercion for IN expression with subquery

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555435#comment-16555435 ] Apache Spark commented on SPARK-24916: -- User 'wangyum' has created a pull request f

[jira] [Assigned] (SPARK-24916) Fix type coercion for IN expression with subquery

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24916: Assignee: (was: Apache Spark) > Fix type coercion for IN expression with subquery > -

[jira] [Assigned] (SPARK-24916) Fix type coercion for IN expression with subquery

2018-07-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24916: Assignee: Apache Spark > Fix type coercion for IN expression with subquery >

[jira] [Created] (SPARK-24916) Fix type coercion for IN expression with subquery

2018-07-25 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24916: --- Summary: Fix type coercion for IN expression with subquery Key: SPARK-24916 URL: https://issues.apache.org/jira/browse/SPARK-24916 Project: Spark Issue Type: B

[jira] [Commented] (SPARK-21063) Spark return an empty result from remote hadoop cluster

2018-07-25 Thread nick (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555403#comment-16555403 ] nick commented on SPARK-21063: -- [~paulstaab] It does work when both registering the dialec

[jira] [Commented] (SPARK-24904) Join with broadcasted dataframe causes shuffle of redundant data

2018-07-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555477#comment-16555477 ] Marco Gaido commented on SPARK-24904: - You cannot do a broadcast join when it is on

[jira] [Assigned] (SPARK-19018) spark csv writer charset support

2018-07-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-19018: Assignee: Carlos Peña > spark csv writer charset support > --

[jira] [Created] (SPARK-24915) Calling SparkSession.createDataFrame with schema can throw exception

2018-07-25 Thread Stephen Spencer (JIRA)
Stephen Spencer created SPARK-24915: --- Summary: Calling SparkSession.createDataFrame with schema can throw exception Key: SPARK-24915 URL: https://issues.apache.org/jira/browse/SPARK-24915 Project: S