[jira] [Commented] (SPARK-3179) Add task OutputMetrics

2014-09-01 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117825#comment-14117825 ] Sandy Ryza commented on SPARK-3179: --- Hi Michael, Happy to help review your code or answ

[jira] [Updated] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles

2014-09-01 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1239: -- Summary: Don't fetch all map output statuses at each reducer during shuffles (was: Don't fetch all map

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118286#comment-14118286 ] Sandy Ryza commented on SPARK-2978: --- IIUC, that would require using ShuffledRDD directly

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119080#comment-14119080 ] Sandy Ryza commented on SPARK-2978: --- What's the thinking behind adding sortWithinPartiti

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119091#comment-14119091 ] Sandy Ryza commented on SPARK-2978: --- Ah ok, sounds good. > Provide an MR-style shuffle

[jira] [Created] (SPARK-3360) Add RowMatrix.multiply(Vector)

2014-09-02 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3360: - Summary: Add RowMatrix.multiply(Vector) Key: SPARK-3360 URL: https://issues.apache.org/jira/browse/SPARK-3360 Project: Spark Issue Type: Improvement Comp

[jira] [Commented] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123238#comment-14123238 ] Sandy Ryza commented on SPARK-3174: --- I've been putting a little bit of thought into this

[jira] [Created] (SPARK-3419) Scheduler shouldn't delay running a task when executors don't reside at any of its preferred locations

2014-09-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3419: - Summary: Scheduler shouldn't delay running a task when executors don't reside at any of its preferred locations Key: SPARK-3419 URL: https://issues.apache.org/jira/browse/SPARK-3419

[jira] [Updated] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3174: -- Attachment: SPARK-3174design.pdf > Under YARN, add and remove executors based on load >

[jira] [Commented] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123504#comment-14123504 ] Sandy Ryza commented on SPARK-3174: --- Posted a high-level design doc. > Under YARN, add

[jira] [Created] (SPARK-3422) JavaAPISuite.getHadoopInputSplits isn't used anywhere

2014-09-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3422: - Summary: JavaAPISuite.getHadoopInputSplits isn't used anywhere Key: SPARK-3422 URL: https://issues.apache.org/jira/browse/SPARK-3422 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-2099) Report TaskMetrics for running tasks

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123587#comment-14123587 ] Sandy Ryza commented on SPARK-2099: --- Yeah, unfortunately I haven't had the chance to add

[jira] [Resolved] (SPARK-3082) yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-3082. --- Resolution: Fixed Fix Version/s: 1.1.0 > yarn.Client.logClusterResourceDetails throws NPE if re

[jira] [Commented] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125235#comment-14125235 ] Sandy Ryza commented on SPARK-3174: --- To be clear, by YARN shuffle you mean the MR2 appro

[jira] [Comment Edited] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125936#comment-14125936 ] Sandy Ryza edited comment on SPARK-3441 at 9/8/14 7:09 PM: --- Beca

[jira] [Commented] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125936#comment-14125936 ] Sandy Ryza commented on SPARK-3441: --- I'll add mention that this can be used to get Hadoo

[jira] [Commented] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126192#comment-14126192 ] Sandy Ryza commented on SPARK-3441: --- bq. One case where you may not care about giving a

[jira] [Commented] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126454#comment-14126454 ] Sandy Ryza commented on SPARK-3441: --- Right. It's not much work, but there are some ques

[jira] [Commented] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127415#comment-14127415 ] Sandy Ryza commented on SPARK-3174: --- bq. Since you mention the graceful decommission as

[jira] [Created] (SPARK-3464) Graceful decommission of executors

2014-09-09 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3464: - Summary: Graceful decommission of executors Key: SPARK-3464 URL: https://issues.apache.org/jira/browse/SPARK-3464 Project: Spark Issue Type: Sub-task R

[jira] [Updated] (SPARK-3460) Discard executors

2014-09-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3460: -- Summary: Discard executors (was: Graceful decommission of idle YARN sessions) > Discard executors > --

[jira] [Updated] (SPARK-3460) Under YARN, discard executors that have been idle

2014-09-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3460: -- Summary: Under YARN, discard executors that have been idle (was: Discard executors) > Under YARN, dis

[jira] [Updated] (SPARK-3464) Graceful decommission of executors

2014-09-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3464: -- Description: In most cases, even when an application is utilizing only a small fraction of its available

[jira] [Commented] (SPARK-3172) Distinguish between shuffle spill on the map and reduce side

2014-09-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130475#comment-14130475 ] Sandy Ryza commented on SPARK-3172: --- I mean in the web UI (which will require distinguis

[jira] [Created] (SPARK-3497) Report serialized size of task binary

2014-09-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3497: - Summary: Report serialized size of task binary Key: SPARK-3497 URL: https://issues.apache.org/jira/browse/SPARK-3497 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-3560) In yarn-cluster mode, jars are distributed through multiple mechanisms.

2014-09-16 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3560: - Summary: In yarn-cluster mode, jars are distributed through multiple mechanisms. Key: SPARK-3560 URL: https://issues.apache.org/jira/browse/SPARK-3560 Project: Spark

[jira] [Updated] (SPARK-3560) In yarn-cluster mode, jars are distributed through multiple mechanisms.

2014-09-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3560: -- Component/s: YARN > In yarn-cluster mode, jars are distributed through multiple mechanisms. > --

[jira] [Commented] (SPARK-3560) In yarn-cluster mode, jars are distributed through multiple mechanisms.

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136882#comment-14136882 ] Sandy Ryza commented on SPARK-3560: --- Right. I believe Min from LinkedIn who discovered

[jira] [Commented] (SPARK-3574) Shuffle finish time always reported as -1

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138246#comment-14138246 ] Sandy Ryza commented on SPARK-3574: --- On it > Shuffle finish time always reported as -1

[jira] [Commented] (SPARK-3577) Shuffle write time incorrect for sort-based shuffle

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138245#comment-14138245 ] Sandy Ryza commented on SPARK-3577: --- On it > Shuffle write time incorrect for sort-base

[jira] [Commented] (SPARK-3530) Pipeline and Parameters

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138319#comment-14138319 ] Sandy Ryza commented on SPARK-3530: --- bq. Isn't the "fit multiple models at once" part a

[jira] [Commented] (SPARK-3577) Shuffle write time incorrect for sort-based shuffle

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138334#comment-14138334 ] Sandy Ryza commented on SPARK-3577: --- Have you noticed the incorrect metrics reported or

[jira] [Commented] (SPARK-3577) Shuffle write time incorrect for sort-based shuffle

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138354#comment-14138354 ] Sandy Ryza commented on SPARK-3577: --- In the old code, the ShuffleWriteMetrics didn't get

[jira] [Updated] (SPARK-3560) In yarn-cluster mode, the same jars are distributed through multiple mechanisms.

2014-09-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3560: -- Summary: In yarn-cluster mode, the same jars are distributed through multiple mechanisms. (was: In yarn

[jira] [Commented] (SPARK-3573) Dataset

2014-09-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139958#comment-14139958 ] Sandy Ryza commented on SPARK-3573: --- Currently SchemaRDD lives inside SQL. Would we mov

[jira] [Created] (SPARK-3605) Typo in SchemaRDD JavaDoc

2014-09-19 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3605: - Summary: Typo in SchemaRDD JavaDoc Key: SPARK-3605 URL: https://issues.apache.org/jira/browse/SPARK-3605 Project: Spark Issue Type: Bug Components: SQL

[jira] [Commented] (SPARK-3573) Dataset

2014-09-19 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141063#comment-14141063 ] Sandy Ryza commented on SPARK-3573: --- Currently SchemaRDD does depend on Catalyst. Are y

[jira] [Commented] (SPARK-3612) Executor shouldn't quit if heartbeat message fails to reach the driver

2014-09-20 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142006#comment-14142006 ] Sandy Ryza commented on SPARK-3612: --- Yeah, we should catch this. Will post a patch. >

[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2014-09-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142524#comment-14142524 ] Sandy Ryza commented on SPARK-3577: --- No problem. Yeah, I agree that a spill time metric

[jira] [Created] (SPARK-3642) Better document the nuances of shared variables

2014-09-22 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3642: - Summary: Better document the nuances of shared variables Key: SPARK-3642 URL: https://issues.apache.org/jira/browse/SPARK-3642 Project: Spark Issue Type: Improveme

[jira] [Commented] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-22 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143908#comment-14143908 ] Sandy Ryza commented on SPARK-3622: --- Is this a duplicate of SPARK-2688? > Provide a cus

[jira] [Resolved] (SPARK-2142) Give better indicator of how GC cuts into task time

2014-09-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2142. --- Resolution: Not a Problem I ran some tests that indicated that only stop-the-world GC time gets includ

[jira] [Commented] (SPARK-3468) WebUI Timeline-View feature

2014-09-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145829#comment-14145829 ] Sandy Ryza commented on SPARK-3468: --- This looks like a really cool addition. > WebUI Ti

[jira] [Created] (SPARK-3682) Add helpful warnings to the UI

2014-09-24 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3682: - Summary: Add helpful warnings to the UI Key: SPARK-3682 URL: https://issues.apache.org/jira/browse/SPARK-3682 Project: Spark Issue Type: New Feature Comp

[jira] [Resolved] (SPARK-2131) Collect per-task filesystem-bytes-read/written metrics

2014-09-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2131. --- Resolution: Duplicate > Collect per-task filesystem-bytes-read/written metrics > -

[jira] [Updated] (SPARK-3682) Add helpful warnings to the UI

2014-09-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3682: -- Target Version/s: 1.3.0 (was: 1.2.0) > Add helpful warnings to the UI > --

[jira] [Resolved] (SPARK-3422) JavaAPISuite.getHadoopInputSplits isn't used anywhere

2014-09-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-3422. --- Resolution: Fixed > JavaAPISuite.getHadoopInputSplits isn't used anywhere > --

[jira] [Commented] (SPARK-3693) Cached Hadoop RDD always return rows with the same value

2014-09-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148303#comment-14148303 ] Sandy Ryza commented on SPARK-3693: --- Spark's documentation actually makes a note of this

[jira] [Updated] (SPARK-3682) Add helpful warnings to the UI

2014-09-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3682: -- Description: Spark has a zillion configuration options and a zillion different things that can go wrong

[jira] [Commented] (SPARK-3682) Add helpful warnings to the UI

2014-09-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148379#comment-14148379 ] Sandy Ryza commented on SPARK-3682: --- Oops, that should have read "increased". When a ta

[jira] [Commented] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158571#comment-14158571 ] Sandy Ryza commented on SPARK-3561: --- I think there may be somewhat of a misunderstanding

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3561: -- Summary: Decouple Spark's API from its execution engine (was: Native Hadoop/YARN integration for batch/

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3561: -- Description: Currently Spark's API is tightly coupled with its backend execution engine. It could be

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3561: -- Description: Currently Spark's API is tightly coupled with its backend execution engine. It could be

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3561: -- Description: Currently Spark's user-facing API is tightly coupled with its backend execution engine.

[jira] [Comment Edited] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158571#comment-14158571 ] Sandy Ryza edited comment on SPARK-3561 at 10/3/14 11:00 PM: -

[jira] [Commented] (SPARK-3464) Graceful decommission of executors

2014-10-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159252#comment-14159252 ] Sandy Ryza commented on SPARK-3464: --- Did you mean to resolve this as "Fixed"? > Gracefu

[jira] [Comment Edited] (SPARK-3464) Graceful decommission of executors

2014-10-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159252#comment-14159252 ] Sandy Ryza edited comment on SPARK-3464 at 10/4/14 7:27 PM: [~

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160937#comment-14160937 ] Sandy Ryza commented on SPARK-3174: --- Thanks for posting the detailed design, Andrew. A

[jira] [Updated] (SPARK-3797) Enable running shuffle service in separate process from executor

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Summary: Enable running shuffle service in separate process from executor (was: Integrate shuffle servi

[jira] [Updated] (SPARK-3797) Enable running shuffle service in separate process from executor

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: This could either mean * Running the shuffle service inside the YARN NodeManager as an Auxi

[jira] [Updated] (SPARK-3797) Enable running shuffle service in separate process from executor

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: This could either mean * Running the shuffle service inside the YARN NodeManager as an auxi

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160960#comment-14160960 ] Sandy Ryza commented on SPARK-3174: --- bq. for instance, lets say I do some ETL stuff wher

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161048#comment-14161048 ] Sandy Ryza commented on SPARK-3174: --- Ah, misread. My opinion is that, for a first cut w

[jira] [Updated] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Summary: Run the shuffle service inside the YARN NodeManager as an AuxiliaryService (was: Enable runnin

[jira] [Updated] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: It's also worth considering running the shuffle service in a YARN container beside the exec

[jira] [Commented] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161636#comment-14161636 ] Sandy Ryza commented on SPARK-3797: --- Not necessarily opposed to this, but wanted to brin

[jira] [Created] (SPARK-3837) Warn when YARN is killing containers for exceeding memory limits

2014-10-07 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3837: - Summary: Warn when YARN is killing containers for exceeding memory limits Key: SPARK-3837 URL: https://issues.apache.org/jira/browse/SPARK-3837 Project: Spark Iss

[jira] [Updated] (SPARK-3682) Add helpful warnings to the UI

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3682: -- Attachment: SPARK-3682Design.pdf Posting an initial design > Add helpful warnings to the UI > -

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162685#comment-14162685 ] Sandy Ryza commented on SPARK-3174: --- bq. Maybe it makes sense to just call it `spark.dy

[jira] [Created] (SPARK-3884) Don't set SPARK_SUBMIT_DRIVER_MEMORY if deploy mode is cluster

2014-10-09 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3884: - Summary: Don't set SPARK_SUBMIT_DRIVER_MEMORY if deploy mode is cluster Key: SPARK-3884 URL: https://issues.apache.org/jira/browse/SPARK-3884 Project: Spark Issue

[jira] [Updated] (SPARK-3884) If deploy mode is cluster, --driver-memory shouldn't apply to client JVM

2014-10-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3884: -- Summary: If deploy mode is cluster, --driver-memory shouldn't apply to client JVM (was: Don't set SPARK

[jira] [Commented] (SPARK-3884) If deploy mode is cluster, --driver-memory shouldn't apply to client JVM

2014-10-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165776#comment-14165776 ] Sandy Ryza commented on SPARK-3884: --- Accidentally assigned this to myself, but others sh

[jira] [Commented] (SPARK-1209) SparkHadoopUtil should not use package org.apache.hadoop

2014-10-13 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169846#comment-14169846 ] Sandy Ryza commented on SPARK-1209: --- Definitely worth changing, in my opinion. This has

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169931#comment-14169931 ] Sandy Ryza commented on SPARK-3174: --- bq. Slow-start is actually not slow at all if we lo

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171024#comment-14171024 ] Sandy Ryza commented on SPARK-3174: --- bq. If I understand correctly, your concern with re

[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171024#comment-14171024 ] Sandy Ryza edited comment on SPARK-3174 at 10/14/14 3:03 PM: -

[jira] [Commented] (SPARK-3360) Add RowMatrix.multiply(Vector)

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171238#comment-14171238 ] Sandy Ryza commented on SPARK-3360: --- bq. You don't need Vector.multiply(RowMatrix) reall

[jira] [Commented] (SPARK-2926) Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle

2014-10-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179631#comment-14179631 ] Sandy Ryza commented on SPARK-2926: --- [~rxin] did you ever get a chance to try this out?

[jira] [Commented] (SPARK-3573) Dataset

2014-10-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183173#comment-14183173 ] Sandy Ryza commented on SPARK-3573: --- Is this still targeted for 1.2? > Dataset > --

[jira] [Commented] (SPARK-1856) Standardize MLlib interfaces

2014-10-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183174#comment-14183174 ] Sandy Ryza commented on SPARK-1856: --- Is this work still targeted for 1.2? > Standardize

[jira] [Commented] (SPARK-3461) Support external groupByKey using repartitionAndSortWithinPartitions

2014-10-28 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186562#comment-14186562 ] Sandy Ryza commented on SPARK-3461: --- SPARK-2926 could help with this as well. > Support

[jira] [Created] (SPARK-4136) Under dynamic allocation, cancel outstanding executor requests when pending task queue is empty

2014-10-29 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4136: - Summary: Under dynamic allocation, cancel outstanding executor requests when pending task queue is empty Key: SPARK-4136 URL: https://issues.apache.org/jira/browse/SPARK-4136

[jira] [Created] (SPARK-4175) Exception on stage page

2014-10-31 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4175: - Summary: Exception on stage page Key: SPARK-4175 URL: https://issues.apache.org/jira/browse/SPARK-4175 Project: Spark Issue Type: Bug Affects Versions: 1.2.0

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192604#comment-14192604 ] Sandy Ryza commented on SPARK-4016: --- It looks like after this change, stage-level summar

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192609#comment-14192609 ] Sandy Ryza commented on SPARK-4016: --- Also, it looks like this can cause an exception: SP

[jira] [Created] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation

2014-10-31 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4178: - Summary: Hadoop input metrics ignore bytes read in RecordReader instantiation Key: SPARK-4178 URL: https://issues.apache.org/jira/browse/SPARK-4178 Project: Spark

[jira] [Commented] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192773#comment-14192773 ] Sandy Ryza commented on SPARK-4178: --- Thanks [~kostas] for noticing this. > Hadoop input

[jira] [Commented] (SPARK-8623) Some queries in spark-sql lead to NullPointerException when using Yarn

2015-06-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601593#comment-14601593 ] Sandy Ryza commented on SPARK-8623: --- Looking into it > Some queries in spark-sql lead t

[jira] [Commented] (SPARK-8623) Some queries in spark-sql lead to NullPointerException when using Yarn

2015-06-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601644#comment-14601644 ] Sandy Ryza commented on SPARK-8623: --- I took a look at the line numbers and it seems like

[jira] [Commented] (SPARK-8623) Some queries in spark-sql lead to NullPointerException when using Yarn

2015-06-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603243#comment-14603243 ] Sandy Ryza commented on SPARK-8623: --- Am able to reproduce this locally. Looking into th

[jira] [Commented] (SPARK-8623) Some queries in spark-sql lead to NullPointerException when using Yarn

2015-06-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603694#comment-14603694 ] Sandy Ryza commented on SPARK-8623: --- Figured out the issue - my patch omitted registerin

[jira] [Updated] (SPARK-8623) Some queries in spark-sql lead to NullPointerException when using Yarn

2015-06-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-8623: -- Component/s: (was: SQL) Spark Core > Some queries in spark-sql lead to NullPointerE

[jira] [Updated] (SPARK-8623) Hadoop RDDs fail to properly serialize configuration

2015-06-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-8623: -- Summary: Hadoop RDDs fail to properly serialize configuration (was: Some queries in spark-sql lead to N

[jira] [Assigned] (SPARK-8623) Some queries in spark-sql lead to NullPointerException when using Yarn

2015-06-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-8623: - Assignee: Sandy Ryza > Some queries in spark-sql lead to NullPointerException when using Yarn > -

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2015-10-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980717#comment-14980717 ] Sandy Ryza commented on SPARK-2089: --- My opinion is that we should be moving towards dyna

[jira] [Commented] (SPARK-9999) Dataset API on top of Catalyst/DataFrame

2015-11-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022634#comment-15022634 ] Sandy Ryza commented on SPARK-: --- [~nchammas] it's not clear that it makes sense to a

[jira] [Created] (SPARK-2084) Mention SPARK_JAR in env var section on configuration page

2014-06-09 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2084: - Summary: Mention SPARK_JAR in env var section on configuration page Key: SPARK-2084 URL: https://issues.apache.org/jira/browse/SPARK-2084 Project: Spark Issue Type

[jira] [Created] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-06-09 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2089: - Summary: With YARN, preferredNodeLocalityData isn't honored Key: SPARK-2089 URL: https://issues.apache.org/jira/browse/SPARK-2089 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-2099) Report metrics for running tasks

2014-06-10 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2099: - Summary: Report metrics for running tasks Key: SPARK-2099 URL: https://issues.apache.org/jira/browse/SPARK-2099 Project: Spark Issue Type: Improvement Affects

  1   2   3   4   5   6   >