[jira] [Created] (SPARK-20456) Document major aggregation functions for pyspark

2017-04-24 Thread Michael Patterson (JIRA)
Michael Patterson created SPARK-20456: - Summary: Document major aggregation functions for pyspark Key: SPARK-20456 URL: https://issues.apache.org/jira/browse/SPARK-20456 Project: Spark Is

[jira] [Assigned] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20455: Assignee: (was: Apache Spark) > Missing Test Target in Documentation for "Running Dock

[jira] [Assigned] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20455: Assignee: Apache Spark > Missing Test Target in Documentation for "Running Docker-based In

[jira] [Commented] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982420#comment-15982420 ] Apache Spark commented on SPARK-20455: -- User 'original-brownbear' has created a pull

[jira] [Created] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-24 Thread Armin Braun (JIRA)
Armin Braun created SPARK-20455: --- Summary: Missing Test Target in Documentation for "Running Docker-based Integration Test Suites" Key: SPARK-20455 URL: https://issues.apache.org/jira/browse/SPARK-20455

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-24 Thread Guozhang Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982377#comment-15982377 ] Guozhang Wang commented on SPARK-18057: --- Just adding the related KIP for the recent

[jira] [Resolved] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

2017-04-24 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20451. - Resolution: Fixed Assignee: Sameer Agarwal Fix Version/s: 2.3.0

[jira] [Resolved] (SPARK-20453) Bump master branch version to 2.3.0-SNAPSHOT

2017-04-24 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-20453. - Resolution: Fixed Fix Version/s: 2.3.0 > Bump master branch version to 2.3.0-SNAPSHOT > --

[jira] [Commented] (SPARK-20239) Improve HistoryServer ACL mechanism

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982299#comment-15982299 ] Apache Spark commented on SPARK-20239: -- User 'jerryshao' has created a pull request

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982251#comment-15982251 ] Peng Meng commented on SPARK-20446: --- Yes, I compared with ML ALSModel.recommendAll. The

[jira] [Resolved] (SPARK-20239) Improve HistoryServer ACL mechanism

2017-04-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-20239. Resolution: Fixed Assignee: Saisai Shao Fix Version/s: 2.2.0 > Improve Hist

[jira] [Commented] (SPARK-20336) spark.read.csv() with wholeFile=True option fails to read non ASCII unicode characters

2017-04-24 Thread HanCheol Cho (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982218#comment-15982218 ] HanCheol Cho commented on SPARK-20336: -- Hi, [~original-brownbear] Thank you for you

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-24 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982180#comment-15982180 ] Shixiong Zhu commented on SPARK-18057: -- [~ijuma] it's not a regression. In Kafka 0.1

[jira] [Comment Edited] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-24 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982180#comment-15982180 ] Shixiong Zhu edited comment on SPARK-18057 at 4/25/17 12:34 AM: ---

[jira] [Updated] (SPARK-20454) Improvement of ShortestPaths in Spark GraphX

2017-04-24 Thread Ji Dai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Dai updated SPARK-20454: --- Summary: Improvement of ShortestPaths in Spark GraphX (was: improvement of ShortestPaths in Spark GraphX) >

[jira] [Updated] (SPARK-20454) Improvement of ShortestPaths in Spark GraphX

2017-04-24 Thread Ji Dai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Dai updated SPARK-20454: --- Description: The output of ShortestPaths is not enough. ShortestPaths in Graph/lib is currently in a simple

[jira] [Commented] (SPARK-18901) Require in LR LogisticAggregator is redundant

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982160#comment-15982160 ] Apache Spark commented on SPARK-18901: -- User 'wangmiao1981' has created a pull reque

[jira] [Updated] (SPARK-20454) improvement of ShortestPaths in Spark GraphX

2017-04-24 Thread Ji Dai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Dai updated SPARK-20454: --- Summary: improvement of ShortestPaths in Spark GraphX (was: Concern about improvement of ShortestPaths in Sp

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-24 Thread Ismael Juma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982153#comment-15982153 ] Ismael Juma commented on SPARK-18057: - [~helena], about KAFKA-4879, are you suggestin

[jira] [Assigned] (SPARK-9103) Tracking spark's memory usage

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9103: --- Assignee: (was: Apache Spark) > Tracking spark's memory usage > -

[jira] [Assigned] (SPARK-9103) Tracking spark's memory usage

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9103: --- Assignee: Apache Spark > Tracking spark's memory usage > - > >

[jira] [Reopened] (SPARK-9103) Tracking spark's memory usage

2017-04-24 Thread Jose Soltren (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Soltren reopened SPARK-9103: - Work in progress. > Tracking spark's memory usage > - > >

[jira] [Updated] (SPARK-20454) Concern about improvement of ShortestPaths in Spark GraphX

2017-04-24 Thread Ji Dai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Dai updated SPARK-20454: --- Description: The output of ShortestPaths is not enough. ShortestPaths in Graph/lib is currently in a simple

[jira] [Created] (SPARK-20454) Concern about improvement of ShortestPaths in Spark GraphX

2017-04-24 Thread Ji Dai (JIRA)
Ji Dai created SPARK-20454: -- Summary: Concern about improvement of ShortestPaths in Spark GraphX Key: SPARK-20454 URL: https://issues.apache.org/jira/browse/SPARK-20454 Project: Spark Issue Type: Im

[jira] [Resolved] (SPARK-20450) Unexpected first-query schema inference cost with 2.1.1 RC

2017-04-24 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-20450. --- Resolution: Fixed Assignee: Eric Liang Fix Version/s: 2.1.1 > Unexpec

[jira] [Comment Edited] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-24 Thread Ismael Juma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982077#comment-15982077 ] Ismael Juma edited comment on SPARK-18057 at 4/24/17 11:07 PM:

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-24 Thread Ismael Juma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982077#comment-15982077 ] Ismael Juma commented on SPARK-18057: - Hi. A few clarifications below. "Based on pre

[jira] [Assigned] (SPARK-20453) Bump master branch version to 2.3.0-SNAPSHOT

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20453: Assignee: Apache Spark (was: Josh Rosen) > Bump master branch version to 2.3.0-SNAPSHOT >

[jira] [Assigned] (SPARK-20453) Bump master branch version to 2.3.0-SNAPSHOT

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20453: Assignee: Josh Rosen (was: Apache Spark) > Bump master branch version to 2.3.0-SNAPSHOT >

[jira] [Commented] (SPARK-20453) Bump master branch version to 2.3.0-SNAPSHOT

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982071#comment-15982071 ] Apache Spark commented on SPARK-20453: -- User 'JoshRosen' has created a pull request

[jira] [Created] (SPARK-20453) Bump master branch version to 2.3.0-SNAPSHOT

2017-04-24 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20453: -- Summary: Bump master branch version to 2.3.0-SNAPSHOT Key: SPARK-20453 URL: https://issues.apache.org/jira/browse/SPARK-20453 Project: Spark Issue Type: Bug

[jira] [Assigned] (SPARK-20452) Cancel a batch Kafka query and rerun the same DataFrame may cause ConcurrentModificationException

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20452: Assignee: Shixiong Zhu (was: Apache Spark) > Cancel a batch Kafka query and rerun the sam

[jira] [Commented] (SPARK-20452) Cancel a batch Kafka query and rerun the same DataFrame may cause ConcurrentModificationException

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982033#comment-15982033 ] Apache Spark commented on SPARK-20452: -- User 'zsxwing' has created a pull request fo

[jira] [Assigned] (SPARK-20452) Cancel a batch Kafka query and rerun the same DataFrame may cause ConcurrentModificationException

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20452: Assignee: Apache Spark (was: Shixiong Zhu) > Cancel a batch Kafka query and rerun the sam

[jira] [Created] (SPARK-20452) Cancel a batch Kafka query and rerun the same DataFrame may cause ConcurrentModificationException

2017-04-24 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-20452: Summary: Cancel a batch Kafka query and rerun the same DataFrame may cause ConcurrentModificationException Key: SPARK-20452 URL: https://issues.apache.org/jira/browse/SPARK-20452

[jira] [Commented] (SPARK-20435) More thorough redaction of sensitive information from logs/UI, more unit tests

2017-04-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981947#comment-15981947 ] Marcelo Vanzin commented on SPARK-20435: bq. The user copies over the entire conf

[jira] [Assigned] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20451: Assignee: (was: Apache Spark) > Filter out nested mapType datatypes from sort order in

[jira] [Assigned] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20451: Assignee: Apache Spark > Filter out nested mapType datatypes from sort order in randomSpli

[jira] [Commented] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981926#comment-15981926 ] Apache Spark commented on SPARK-20451: -- User 'sameeragarwal' has created a pull requ

[jira] [Created] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

2017-04-24 Thread Sameer Agarwal (JIRA)
Sameer Agarwal created SPARK-20451: -- Summary: Filter out nested mapType datatypes from sort order in randomSplit Key: SPARK-20451 URL: https://issues.apache.org/jira/browse/SPARK-20451 Project: Spark

[jira] [Assigned] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-4899: --- Assignee: Apache Spark > Support Mesos features: roles and checkpoints >

[jira] [Assigned] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-4899: --- Assignee: (was: Apache Spark) > Support Mesos features: roles and checkpoints > -

[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981839#comment-15981839 ] Apache Spark commented on SPARK-4899: - User 'gkc2104' has created a pull request for t

[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-04-24 Thread Kamal Gurala (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981837#comment-15981837 ] Kamal Gurala commented on SPARK-4899: - https://github.com/apache/spark/pull/17750 > S

[jira] [Commented] (SPARK-1359) SGD implementation is not efficient

2017-04-24 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981794#comment-15981794 ] yu peng commented on SPARK-1359: i think by randomly shuffle partitions and do gradient De

[jira] [Assigned] (SPARK-20449) Upgrade breeze version to 0.13.1

2017-04-24 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-20449: --- Assignee: Yanbo Liang > Upgrade breeze version to 0.13.1 > > >

[jira] [Commented] (SPARK-20450) Unexpected first-query schema inference cost with 2.1.1 RC

2017-04-24 Thread Eric Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981753#comment-15981753 ] Eric Liang commented on SPARK-20450: I'm not sure what you mean by new issue, but it'

[jira] [Comment Edited] (SPARK-20450) Unexpected first-query schema inference cost with 2.1.1 RC

2017-04-24 Thread Eric Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981753#comment-15981753 ] Eric Liang edited comment on SPARK-20450 at 4/24/17 7:40 PM: -

[jira] [Assigned] (SPARK-20450) Unexpected first-query schema inference cost with 2.1.1 RC

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20450: Assignee: (was: Apache Spark) > Unexpected first-query schema inference cost with 2.1.

[jira] [Assigned] (SPARK-20450) Unexpected first-query schema inference cost with 2.1.1 RC

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20450: Assignee: Apache Spark > Unexpected first-query schema inference cost with 2.1.1 RC >

[jira] [Commented] (SPARK-20450) Unexpected first-query schema inference cost with 2.1.1 RC

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981750#comment-15981750 ] Apache Spark commented on SPARK-20450: -- User 'ericl' has created a pull request for

[jira] [Updated] (SPARK-20450) Unexpected first-query schema inference cost with 2.1.1 RC

2017-04-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-20450: -- Priority: Major (was: Blocker) [~ekhliang]: don't set Blocker. Is this actually a new issue? or a ques

[jira] [Closed] (SPARK-20440) Allow SparkR session and context to have delayed binding

2017-04-24 Thread Vinayak Joshi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayak Joshi closed SPARK-20440. - Resolution: Workaround > Allow SparkR session and context to have delayed binding > -

[jira] [Commented] (SPARK-20440) Allow SparkR session and context to have delayed binding

2017-04-24 Thread Vinayak Joshi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981744#comment-15981744 ] Vinayak Joshi commented on SPARK-20440: --- The PR comments include a workaround for

[jira] [Created] (SPARK-20450) Unexpected first-query schema inference cost with 2.1.1 RC

2017-04-24 Thread Eric Liang (JIRA)
Eric Liang created SPARK-20450: -- Summary: Unexpected first-query schema inference cost with 2.1.1 RC Key: SPARK-20450 URL: https://issues.apache.org/jira/browse/SPARK-20450 Project: Spark Issue

[jira] [Commented] (SPARK-20435) More thorough redaction of sensitive information from logs/UI, more unit tests

2017-04-24 Thread Mark Grover (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981730#comment-15981730 ] Mark Grover commented on SPARK-20435: - bq. I'm not saying redacting from logs is usel

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-24 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981728#comment-15981728 ] Kazuaki Ishizaki commented on SPARK-20392: -- Thank you. I confirmed that blockbus

[jira] [Commented] (SPARK-20312) query optimizer calls udf with null values when it doesn't expect them

2017-04-24 Thread Albert Meltzer (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981694#comment-15981694 ] Albert Meltzer commented on SPARK-20312: [~maropu]: making the query a bit simple

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981648#comment-15981648 ] Nick Pentreath commented on SPARK-20446: By "compare to DataFrame implementation"

[jira] [Commented] (SPARK-7481) Add spark-hadoop-cloud module to pull in object store support

2017-04-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981602#comment-15981602 ] Steve Loughran commented on SPARK-7481: --- One thing I want to emphasise here is: I ha

[jira] [Assigned] (SPARK-19812) YARN shuffle service fails to relocate recovery DB across NFS directories

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19812: Assignee: Thomas Graves (was: Apache Spark) > YARN shuffle service fails to relocate reco

[jira] [Assigned] (SPARK-19812) YARN shuffle service fails to relocate recovery DB across NFS directories

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19812: Assignee: Apache Spark (was: Thomas Graves) > YARN shuffle service fails to relocate reco

[jira] [Updated] (SPARK-19812) YARN shuffle service fails to relocate recovery DB across NFS directories

2017-04-24 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-19812: -- Summary: YARN shuffle service fails to relocate recovery DB across NFS directories (was: YARN

[jira] [Commented] (SPARK-19812) YARN shuffle service fails to relocate recovery DB across NFS directories

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981592#comment-15981592 ] Apache Spark commented on SPARK-19812: -- User 'tgravescs' has created a pull request

[jira] [Resolved] (SPARK-20208) Document R fpGrowth support in vignettes, programming guide and code example

2017-04-24 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-20208. -- Resolution: Fixed Fix Version/s: 2.2.0 > Document R fpGrowth support in vignettes, progr

[jira] [Commented] (SPARK-20115) Fix DAGScheduler to recompute all the lost shuffle blocks when external shuffle service is unavailable

2017-04-24 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-20115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981585#comment-15981585 ] Juan Rodríguez Hortalá commented on SPARK-20115: SPARK-20178 is a discuss

[jira] [Resolved] (SPARK-20438) R wrappers for split and repeat

2017-04-24 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-20438. -- Resolution: Fixed Assignee: Maciej Szymkiewicz Fix Version/s: 2.3.0

[jira] [Comment Edited] (SPARK-20107) Add spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version option to configuration.md

2017-04-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981552#comment-15981552 ] Steve Loughran edited comment on SPARK-20107 at 4/24/17 5:30 PM: --

[jira] [Commented] (SPARK-20107) Add spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version option to configuration.md

2017-04-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981552#comment-15981552 ] Steve Loughran commented on SPARK-20107: This does not solve the problem you thin

[jira] [Closed] (SPARK-20379) Allow setting SSL-related passwords through env variables

2017-04-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin closed SPARK-20379. -- Resolution: Not A Problem Just remembered that in 2.x (2.1 at least) users can reference env va

[jira] [Comment Edited] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981148#comment-15981148 ] Peng Meng edited comment on SPARK-20446 at 4/24/17 4:18 PM: T

[jira] [Commented] (SPARK-11373) Add metrics to the History Server and providers

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981410#comment-15981410 ] Apache Spark commented on SPARK-11373: -- User 'steveloughran' has created a pull requ

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20392: - Attachment: model_9756.zip blockbuster_fewCols.csv attaching blockbuster_fewCols.

[jira] [Resolved] (SPARK-18901) Require in LR LogisticAggregator is redundant

2017-04-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang resolved SPARK-18901. - Resolution: Fixed Assignee: Miao Wang Fix Version/s: 2.2.0 > Require in LR Logist

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981386#comment-15981386 ] Barry Becker commented on SPARK-20392: -- [~viirya] that is correct. If I reduce the d

[jira] [Commented] (SPARK-18791) Stream-Stream Joins

2017-04-24 Thread Saul Shanabrook (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981383#comment-15981383 ] Saul Shanabrook commented on SPARK-18791: - I am using Spark to process the result

[jira] [Commented] (SPARK-20449) Upgrade breeze version to 0.13.1

2017-04-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981321#comment-15981321 ] Sean Owen commented on SPARK-20449: --- What if anything are the compatibility issues? tha

[jira] [Assigned] (SPARK-20449) Upgrade breeze version to 0.13.1

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20449: Assignee: (was: Apache Spark) > Upgrade breeze version to 0.13.1 > ---

[jira] [Commented] (SPARK-20449) Upgrade breeze version to 0.13.1

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981270#comment-15981270 ] Apache Spark commented on SPARK-20449: -- User 'yanboliang' has created a pull request

[jira] [Assigned] (SPARK-20449) Upgrade breeze version to 0.13.1

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20449: Assignee: Apache Spark > Upgrade breeze version to 0.13.1 > --

[jira] [Updated] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-24 Thread Rick Moritz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Moritz updated SPARK-20155: Description: According to : https://tools.ietf.org/html/rfc4180#section-2 7. If double-quotes are

[jira] [Updated] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-24 Thread Rick Moritz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Moritz updated SPARK-20155: Description: According to : https://tools.ietf.org/html/rfc4180#section-2 7. If double-quotes are

[jira] [Commented] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-24 Thread Rick Moritz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981185#comment-15981185 ] Rick Moritz commented on SPARK-20155: - Good info, thanks. I've added a link. > CSV-f

[jira] [Created] (SPARK-20449) Upgrade breeze version to 0.13.1

2017-04-24 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-20449: --- Summary: Upgrade breeze version to 0.13.1 Key: SPARK-20449 URL: https://issues.apache.org/jira/browse/SPARK-20449 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-20436) NullPointerException when restart from checkpoint file

2017-04-24 Thread Armin Braun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armin Braun updated SPARK-20436: Description: I have written a Spark Streaming application which have two DStreams. Code is : {code}

[jira] [Commented] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-24 Thread Armin Braun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981166#comment-15981166 ] Armin Braun commented on SPARK-20155: - [~RPCMoritz] take a look at what I just found:

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981148#comment-15981148 ] Peng Meng commented on SPARK-20446: --- Thanks [~mlnick], I also compared DataFrame Versio

[jira] [Commented] (SPARK-17159) Improve FileInputDStream.findNewFiles list performance

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981143#comment-15981143 ] Apache Spark commented on SPARK-17159: -- User 'steveloughran' has created a pull requ

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981134#comment-15981134 ] Nick Pentreath commented on SPARK-20446: Also would be good to compare to the new

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981129#comment-15981129 ] Nick Pentreath commented on SPARK-20446: Anyway I'd like to compare the approache

[jira] [Commented] (SPARK-20426) OneForOneStreamManager occupies too much memory.

2017-04-24 Thread jin xing (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981124#comment-15981124 ] jin xing commented on SPARK-20426: -- [~jerryshao] I think lazy initialization can resolv

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981122#comment-15981122 ] Nick Pentreath commented on SPARK-20446: The GC would come from the temp result a

[jira] [Assigned] (SPARK-20426) OneForOneStreamManager occupies too much memory.

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20426: Assignee: Apache Spark > OneForOneStreamManager occupies too much memory. > --

[jira] [Assigned] (SPARK-20426) OneForOneStreamManager occupies too much memory.

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20426: Assignee: (was: Apache Spark) > OneForOneStreamManager occupies too much memory. > ---

[jira] [Commented] (SPARK-20426) OneForOneStreamManager occupies too much memory.

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981117#comment-15981117 ] Apache Spark commented on SPARK-20426: -- User 'jinxing64' has created a pull request

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981115#comment-15981115 ] Peng Meng commented on SPARK-20446: --- I think you said: https://github.com/apache/spark/

[jira] [Assigned] (SPARK-20448) Document how FileInputDStream works with object storage

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20448: Assignee: (was: Apache Spark) > Document how FileInputDStream works with object storag

[jira] [Assigned] (SPARK-20448) Document how FileInputDStream works with object storage

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20448: Assignee: Apache Spark > Document how FileInputDStream works with object storage > ---

[jira] [Commented] (SPARK-20448) Document how FileInputDStream works with object storage

2017-04-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981094#comment-15981094 ] Apache Spark commented on SPARK-20448: -- User 'steveloughran' has created a pull requ

[jira] [Commented] (SPARK-17159) Improve FileInputDStream.findNewFiles list performance

2017-04-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981081#comment-15981081 ] Steve Loughran commented on SPARK-17159: pulled out documentation into separate J

  1   2   >