[jira] [Commented] (SPARK-21610) Corrupt records are not handled properly when creating a dataframe from a file

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162575#comment-16162575 ] Apache Spark commented on SPARK-21610: -- User 'jmchung' has created a pull request fo

[jira] [Commented] (SPARK-21926) Some transformers in spark.ml.feature fail when trying to transform streaming dataframes

2017-09-11 Thread Matthew Slipper (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162568#comment-16162568 ] Matthew Slipper commented on SPARK-21926: - I'm happy to take a stab at number 1 (

[jira] [Created] (SPARK-21978) schemaInference option not to convert strings with leading zeros to int/long

2017-09-11 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created SPARK-21978: - Summary: schemaInference option not to convert strings with leading zeros to int/long Key: SPARK-21978 URL: https://issues.apache.org/jira/browse/SPARK-21978

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

2017-09-11 Thread Rajesh Chandramohan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162522#comment-16162522 ] Rajesh Chandramohan commented on SPARK-14927: - [~ctang.ma] , I was talking he

[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-09-11 Thread Drew Robb (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162519#comment-16162519 ] Drew Robb commented on SPARK-21133: --- My mistake, you are absolutely correct. I had some

[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162508#comment-16162508 ] Wenchen Fan commented on SPARK-21133: - This patch was merged at June, and Spark 2.2.0

[jira] [Commented] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162496#comment-16162496 ] Apache Spark commented on SPARK-18608: -- User 'zhengruifeng' has created a pull reque

[jira] [Assigned] (SPARK-21977) SinglePartition optimizations break certain Streaming Stateful Aggregation requirements

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21977: Assignee: Burak Yavuz (was: Apache Spark) > SinglePartition optimizations break certain S

[jira] [Assigned] (SPARK-21977) SinglePartition optimizations break certain Streaming Stateful Aggregation requirements

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21977: Assignee: Apache Spark (was: Burak Yavuz) > SinglePartition optimizations break certain S

[jira] [Commented] (SPARK-21977) SinglePartition optimizations break certain Streaming Stateful Aggregation requirements

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162482#comment-16162482 ] Apache Spark commented on SPARK-21977: -- User 'brkyvz' has created a pull request for

[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-09-11 Thread Drew Robb (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162469#comment-16162469 ] Drew Robb commented on SPARK-21133: --- Thanks for the fix on this, but I don't think the

[jira] [Commented] (SPARK-17602) PySpark - Performance Optimization Large Size of Broadcast Variable

2017-09-11 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162457#comment-16162457 ] holdenk commented on SPARK-17602: - [~liujunf] how about you go ahead and make a pull requ

[jira] [Updated] (SPARK-21977) SinglePartition optimizations break certain Streaming Stateful Aggregation requirements

2017-09-11 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-21977: Description: This is a bit hard to explain as there are several issues here, I'll try my best. Her

[jira] [Updated] (SPARK-21977) SinglePartition optimizations break certain Streaming Stateful Aggregation requirements

2017-09-11 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-21977: Summary: SinglePartition optimizations break certain Streaming Stateful Aggregation requirements (

[jira] [Created] (SPARK-21977) SinglePartition optimizations break certain StateStore requirements

2017-09-11 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-21977: --- Summary: SinglePartition optimizations break certain StateStore requirements Key: SPARK-21977 URL: https://issues.apache.org/jira/browse/SPARK-21977 Project: Spark

[jira] [Assigned] (SPARK-21977) SinglePartition optimizations break certain StateStore requirements

2017-09-11 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz reassigned SPARK-21977: --- Assignee: Burak Yavuz > SinglePartition optimizations break certain StateStore requirements

[jira] [Updated] (SPARK-21977) SinglePartition optimizations break certain StateStore requirements

2017-09-11 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-21977: Description: This is a bit hard to explain as there are several issues here > SinglePartition optim

[jira] [Commented] (SPARK-10365) Support Parquet logical type TIMESTAMP_MICROS

2017-09-11 Thread Ryan Munro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162295#comment-16162295 ] Ryan Munro commented on SPARK-10365: It looks like {{TIMESTAMP_MICROS}} is supported

[jira] [Updated] (SPARK-19357) Parallel Model Evaluation for ML Tuning: Scala

2017-09-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-19357: - Attachment: parallelism-verification-test.pdf Adding a document to show verification testing for

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162205#comment-16162205 ] Bryan Cutler commented on SPARK-21190: -- Thanks [~icexelloss]. I definitely think co

[jira] [Commented] (SPARK-19422) Cache input data in algorithms

2017-09-11 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162008#comment-16162008 ] Joseph K. Bradley commented on SPARK-19422: --- Linking [SPARK-21972], which may i

[jira] [Commented] (SPARK-21972) Allow users to control input data persistence in ML Estimators via a handlePersistence ml.Param

2017-09-11 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162007#comment-16162007 ] Joseph K. Bradley commented on SPARK-21972: --- The issue (a) does not really conf

[jira] [Commented] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-09-11 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162004#comment-16162004 ] Joseph K. Bradley commented on SPARK-18608: --- Hi all, it looks like there has be

[jira] [Closed] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-09-11 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley closed SPARK-21799. - Resolution: Duplicate > KMeans performance regression (5-6x slowdown) in Spark 2.2 >

[jira] [Commented] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-09-11 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161992#comment-16161992 ] Joseph K. Bradley commented on SPARK-21799: --- Now that I've caught up on these,

[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-09-11 Thread Anthony Dotterer (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161639#comment-16161639 ] Anthony Dotterer commented on SPARK-20958: -- For those not well versed in sbt sha

[jira] [Commented] (SPARK-21972) Allow users to control input data persistence in ML Estimators via a handlePersistence ml.Param

2017-09-11 Thread Siddharth Murching (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161634#comment-16161634 ] Siddharth Murching commented on SPARK-21972: Link to old PR containing work o

[jira] [Issue Comment Deleted] (SPARK-21972) Allow users to control input data persistence in ML Estimators via a handlePersistence ml.Param

2017-09-11 Thread Siddharth Murching (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Murching updated SPARK-21972: --- Comment: was deleted (was: This issue was originally being worked on in this PR: [ht

[jira] [Comment Edited] (SPARK-21972) Allow users to control input data persistence in ML Estimators via a handlePersistence ml.Param

2017-09-11 Thread Siddharth Murching (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160624#comment-16160624 ] Siddharth Murching edited comment on SPARK-21972 at 9/11/17 5:22 PM: --

[jira] [Commented] (SPARK-20589) Allow limiting task concurrency per stage

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161561#comment-16161561 ] Apache Spark commented on SPARK-20589: -- User 'dhruve' has created a pull request for

[jira] [Commented] (SPARK-21896) Stack Overflow when window function nested inside aggregate function

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161549#comment-16161549 ] Apache Spark commented on SPARK-21896: -- User 'aokolnychyi' has created a pull reques

[jira] [Assigned] (SPARK-21896) Stack Overflow when window function nested inside aggregate function

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21896: Assignee: (was: Apache Spark) > Stack Overflow when window function nested inside aggr

[jira] [Assigned] (SPARK-21896) Stack Overflow when window function nested inside aggregate function

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21896: Assignee: Apache Spark > Stack Overflow when window function nested inside aggregate funct

[jira] [Assigned] (SPARK-21958) Attempting to save large Word2Vec model hangs driver in constant GC.

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21958: Assignee: Apache Spark > Attempting to save large Word2Vec model hangs driver in constant

[jira] [Commented] (SPARK-21958) Attempting to save large Word2Vec model hangs driver in constant GC.

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161471#comment-16161471 ] Apache Spark commented on SPARK-21958: -- User 'travishegner' has created a pull reque

[jira] [Assigned] (SPARK-21958) Attempting to save large Word2Vec model hangs driver in constant GC.

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21958: Assignee: (was: Apache Spark) > Attempting to save large Word2Vec model hangs driver i

[jira] [Comment Edited] (SPARK-21958) Attempting to save large Word2Vec model hangs driver in constant GC.

2017-09-11 Thread Travis Hegner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161451#comment-16161451 ] Travis Hegner edited comment on SPARK-21958 at 9/11/17 3:38 PM: ---

[jira] [Commented] (SPARK-21958) Attempting to save large Word2Vec model hangs driver in constant GC.

2017-09-11 Thread Travis Hegner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161451#comment-16161451 ] Travis Hegner commented on SPARK-21958: --- Running the patch applied to tag {{v2.2.0}

[jira] [Commented] (SPARK-21765) Ensure all leaf nodes that are derived from streaming sources have isStreaming=true

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161394#comment-16161394 ] Apache Spark commented on SPARK-21765: -- User 'joseph-torres' has created a pull requ

[jira] [Commented] (SPARK-21976) Fix wrong doc about Mean Absolute Error

2017-09-11 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161277#comment-16161277 ] Favio Vázquez commented on SPARK-21976: --- I meant in the doc for the webpage, not th

[jira] [Commented] (SPARK-21976) Fix wrong doc about Mean Absolute Error

2017-09-11 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161276#comment-16161276 ] Favio Vázquez commented on SPARK-21976: --- I stated that in the documentation the div

[jira] [Assigned] (SPARK-21976) Fix wrong doc about Mean Absolute Error

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21976: Assignee: (was: Apache Spark) > Fix wrong doc about Mean Absolute Error >

[jira] [Commented] (SPARK-21976) Fix wrong doc about Mean Absolute Error

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161271#comment-16161271 ] Apache Spark commented on SPARK-21976: -- User 'FavioVazquez' has created a pull reque

[jira] [Updated] (SPARK-21976) Fix wrong doc about Mean Absolute Error

2017-09-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-21976: -- Priority: Trivial (was: Minor) You haven't said what you think is wrong. The docs look correct: {code

[jira] [Assigned] (SPARK-21976) Fix wrong doc about Mean Absolute Error

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21976: Assignee: Apache Spark > Fix wrong doc about Mean Absolute Error > ---

[jira] [Created] (SPARK-21976) Fix wrong doc about Mean Absolute Error

2017-09-11 Thread JIRA
Favio Vázquez created SPARK-21976: - Summary: Fix wrong doc about Mean Absolute Error Key: SPARK-21976 URL: https://issues.apache.org/jira/browse/SPARK-21976 Project: Spark Issue Type: Documen

[jira] [Updated] (SPARK-21976) Fix wrong doc about Mean Absolute Error

2017-09-11 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Favio Vázquez updated SPARK-21976: -- Description: Fix wrong doc for MAE in webpage. Even though the code is correct for the MAE:

[jira] [Commented] (SPARK-21974) SVD computation results in failure to load NativeSystemARPACK and NativeRefARPACK

2017-09-11 Thread Aleksandr Ovcharenko (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161217#comment-16161217 ] Aleksandr Ovcharenko commented on SPARK-21974: -- Ok, thanks for your reply. I

[jira] [Updated] (SPARK-17642) Support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17642: - Description: Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command to show column-level sta

[jira] [Updated] (SPARK-17642) Support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17642: - Description: Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command to show column informati

[jira] [Updated] (SPARK-16026) Cost-based Optimizer Framework

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-16026: - Summary: Cost-based Optimizer Framework (was: Cost-based Optimizer framework) > Cost-based Opti

[jira] [Updated] (SPARK-17074) generate equi-height histogram for column

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Description: Equi-height histogram is effective in handling skewed data distribution. For equi-h

[jira] [Updated] (SPARK-17074) generate equi-height histogram for column

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Description: Equi-height histogram is effective in handling skewed data distribution. For equi-h

[jira] [Updated] (SPARK-17074) generate equi-height histogram for column

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Description: Equi-height histogram is effective in handling skewed data distribution. For equi-h

[jira] [Updated] (SPARK-17074) generate equi-height histogram for column

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Description: Equi-height histogram is effective in handling skewed data distribution. For equi-h

[jira] [Updated] (SPARK-17074) generate equi-height histogram for column

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Summary: generate equi-height histogram for column (was: generate histogram information for colu

[jira] [Commented] (SPARK-18149) build side decision based on cbo

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161139#comment-16161139 ] Zhenhua Wang commented on SPARK-18149: -- This is inherently done in SparkStrategies.

[jira] [Closed] (SPARK-18149) build side decision based on cbo

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang closed SPARK-18149. Resolution: Implemented > build side decision based on cbo > > >

[jira] [Closed] (SPARK-17079) broadcast decision based on cbo

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang closed SPARK-17079. Resolution: Implemented > broadcast decision based on cbo > --- > >

[jira] [Commented] (SPARK-17079) broadcast decision based on cbo

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161136#comment-16161136 ] Zhenhua Wang commented on SPARK-17079: -- This is inherently done in SparkStrategies.

[jira] [Updated] (SPARK-21322) support histogram in filter cardinality estimation

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21322: - Affects Version/s: (was: 2.1.0) 2.3.0 > support histogram in filter ca

[jira] [Updated] (SPARK-21322) support histogram in filter cardinality estimation

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21322: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-21975 > support histogram in filte

[jira] [Updated] (SPARK-21322) support histogram in filter cardinality estimation

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21322: - Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-16026) > support histogr

[jira] [Updated] (SPARK-17074) generate histogram information for column

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-21975 > generate histogram informa

[jira] [Created] (SPARK-21975) Histogram support in cost-based optimizer

2017-09-11 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-21975: Summary: Histogram support in cost-based optimizer Key: SPARK-21975 URL: https://issues.apache.org/jira/browse/SPARK-21975 Project: Spark Issue Type: Improve

[jira] [Updated] (SPARK-17074) generate histogram information for column

2017-09-11 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-16026) > generate histog

[jira] [Resolved] (SPARK-21974) SVD computation results in failure to load NativeSystemARPACK and NativeRefARPACK

2017-09-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21974. --- Resolution: Not A Problem This isn't an error and isn't a Spark issue. It has to go with netlib-java

[jira] [Created] (SPARK-21974) SVD computation results in failure to load NativeSystemARPACK and NativeRefARPACK

2017-09-11 Thread Aleksandr Ovcharenko (JIRA)
Aleksandr Ovcharenko created SPARK-21974: Summary: SVD computation results in failure to load NativeSystemARPACK and NativeRefARPACK Key: SPARK-21974 URL: https://issues.apache.org/jira/browse/SPARK-21974

[jira] [Comment Edited] (SPARK-21943) When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view some of the jobs that are running jobs, the returned json information is missing the “de

2017-09-11 Thread xianquan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160968#comment-16160968 ] xianquan edited comment on SPARK-21943 at 9/11/17 10:19 AM: !

[jira] [Commented] (SPARK-21943) When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view some of the jobs that are running jobs, the returned json information is missing the “descrip

2017-09-11 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161008#comment-16161008 ] Saisai Shao commented on SPARK-21943: - If you're trying to report bugs, I think you s

[jira] [Commented] (SPARK-21943) When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view some of the jobs that are running jobs, the returned json information is missing the “descrip

2017-09-11 Thread xianquan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160992#comment-16160992 ] xianquan commented on SPARK-21943: -- I need to get the description information while runn

[jira] [Commented] (SPARK-21943) When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view some of the jobs that are running jobs, the returned json information is missing the “descrip

2017-09-11 Thread xianquan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160968#comment-16160968 ] xianquan commented on SPARK-21943: -- !webUI.png! Thank you for your reply. In the spark t

[jira] [Updated] (SPARK-21963) create temp file should be delete after use

2017-09-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-21963: -- Priority: Trivial (was: Major) Issue Type: Improvement (was: Bug) Not a bug, for sure, please f

[jira] [Updated] (SPARK-21943) When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view some of the jobs that are running jobs, the returned json information is missing the “descripti

2017-09-11 Thread xianquan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xianquan updated SPARK-21943: - Attachment: webUI.png > When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view > some

[jira] [Assigned] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21786: Assignee: (was: Apache Spark) > The 'spark.sql.parquet.compression.codec' configuratio

[jira] [Commented] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160937#comment-16160937 ] Apache Spark commented on SPARK-21786: -- User 'fjh100456' has created a pull request

[jira] [Assigned] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21786: Assignee: Apache Spark > The 'spark.sql.parquet.compression.codec' configuration doesn't t

[jira] [Commented] (SPARK-21940) Support timezone for timestamps in SparkR

2017-09-11 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160936#comment-16160936 ] Yanbo Liang commented on SPARK-21940: - [~falaki] AFAIK, Spark SQL timestamps are norm

[jira] [Closed] (SPARK-21966) ResolveMissingReference rule should not ignore the Union operator

2017-09-11 Thread Feng Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Zhu closed SPARK-21966. Resolution: Later > ResolveMissingReference rule should not ignore the Union operator > ---

[jira] [Commented] (SPARK-21966) ResolveMissingReference rule should not ignore the Union operator

2017-09-11 Thread Feng Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160877#comment-16160877 ] Feng Zhu commented on SPARK-21966: -- The rule ResolveMissingReference does not plan to su

[jira] [Updated] (SPARK-21923) Avoid calling reserveUnrollMemoryForThisTask for every record

2017-09-11 Thread Xianyang Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyang Liu updated SPARK-21923: - Summary: Avoid calling reserveUnrollMemoryForThisTask for every record (was: Avoid call reserveU

[jira] [Resolved] (SPARK-21943) When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view some of the jobs that are running jobs, the returned json information is missing the “descript

2017-09-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21943. --- Resolution: Not A Problem > When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view

[jira] [Commented] (SPARK-21958) Attempting to save large Word2Vec model hangs driver in constant GC.

2017-09-11 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160862#comment-16160862 ] Nick Pentreath commented on SPARK-21958: Seems like your proposal could improve t

[jira] [Assigned] (SPARK-21973) Add a new option to filter queries to run in TPCDSQueryBenchmark

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21973: Assignee: (was: Apache Spark) > Add a new option to filter queries to run in TPCDSQuer

[jira] [Assigned] (SPARK-21973) Add a new option to filter queries to run in TPCDSQueryBenchmark

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21973: Assignee: Apache Spark > Add a new option to filter queries to run in TPCDSQueryBenchmark

[jira] [Commented] (SPARK-21973) Add a new option to filter queries to run in TPCDSQueryBenchmark

2017-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160851#comment-16160851 ] Apache Spark commented on SPARK-21973: -- User 'maropu' has created a pull request for

[jira] [Created] (SPARK-21973) Add a new option to filter queries to run in TPCDSQueryBenchmark

2017-09-11 Thread Takeshi Yamamuro (JIRA)
Takeshi Yamamuro created SPARK-21973: Summary: Add a new option to filter queries to run in TPCDSQueryBenchmark Key: SPARK-21973 URL: https://issues.apache.org/jira/browse/SPARK-21973 Project: Spa