[jira] [Commented] (SPARK-3577) Shuffle write time incorrect for sort-based shuffle

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138354#comment-14138354 ] Sandy Ryza commented on SPARK-3577: --- In the old code, the ShuffleWriteMetrics didn't get

[jira] [Created] (SPARK-3560) In yarn-cluster mode, jars are distributed through multiple mechanisms.

2014-09-16 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3560: - Summary: In yarn-cluster mode, jars are distributed through multiple mechanisms. Key: SPARK-3560 URL: https://issues.apache.org/jira/browse/SPARK-3560 Project: Spark

[jira] [Updated] (SPARK-3560) In yarn-cluster mode, jars are distributed through multiple mechanisms.

2014-09-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3560: -- Component/s: YARN In yarn-cluster mode, jars are distributed through multiple mechanisms.

[jira] [Commented] (SPARK-3172) Distinguish between shuffle spill on the map and reduce side

2014-09-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130475#comment-14130475 ] Sandy Ryza commented on SPARK-3172: --- I mean in the web UI (which will require

[jira] [Created] (SPARK-3497) Report serialized size of task binary

2014-09-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3497: - Summary: Report serialized size of task binary Key: SPARK-3497 URL: https://issues.apache.org/jira/browse/SPARK-3497 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-3464) Graceful decommission of executors

2014-09-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3464: -- Description: In most cases, even when an application is utilizing only a small fraction of its

[jira] [Comment Edited] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125936#comment-14125936 ] Sandy Ryza edited comment on SPARK-3441 at 9/8/14 7:09 PM: ---

[jira] [Commented] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125936#comment-14125936 ] Sandy Ryza commented on SPARK-3441: --- I'll add mention that this can be used to get

[jira] [Commented] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126192#comment-14126192 ] Sandy Ryza commented on SPARK-3441: --- bq. One case where you may not care about giving a

[jira] [Commented] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126454#comment-14126454 ] Sandy Ryza commented on SPARK-3441: --- Right. It's not much work, but there are some

[jira] [Commented] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125235#comment-14125235 ] Sandy Ryza commented on SPARK-3174: --- To be clear, by YARN shuffle you mean the MR2

[jira] [Commented] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123238#comment-14123238 ] Sandy Ryza commented on SPARK-3174: --- I've been putting a little bit of thought into this

[jira] [Created] (SPARK-3419) Scheduler shouldn't delay running a task when executors don't reside at any of its preferred locations

2014-09-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3419: - Summary: Scheduler shouldn't delay running a task when executors don't reside at any of its preferred locations Key: SPARK-3419 URL: https://issues.apache.org/jira/browse/SPARK-3419

[jira] [Updated] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3174: -- Attachment: SPARK-3174design.pdf Under YARN, add and remove executors based on load

[jira] [Commented] (SPARK-3174) Under YARN, add and remove executors based on load

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123504#comment-14123504 ] Sandy Ryza commented on SPARK-3174: --- Posted a high-level design doc. Under YARN, add

[jira] [Commented] (SPARK-2099) Report TaskMetrics for running tasks

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123587#comment-14123587 ] Sandy Ryza commented on SPARK-2099: --- Yeah, unfortunately I haven't had the chance to add

[jira] [Resolved] (SPARK-3082) yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist

2014-09-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-3082. --- Resolution: Fixed Fix Version/s: 1.1.0 yarn.Client.logClusterResourceDetails throws NPE if

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118286#comment-14118286 ] Sandy Ryza commented on SPARK-2978: --- IIUC, that would require using ShuffledRDD

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119080#comment-14119080 ] Sandy Ryza commented on SPARK-2978: --- What's the thinking behind adding

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119091#comment-14119091 ] Sandy Ryza commented on SPARK-2978: --- Ah ok, sounds good. Provide an MR-style shuffle

[jira] [Created] (SPARK-3360) Add RowMatrix.multiply(Vector)

2014-09-02 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3360: - Summary: Add RowMatrix.multiply(Vector) Key: SPARK-3360 URL: https://issues.apache.org/jira/browse/SPARK-3360 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-3179) Add task OutputMetrics

2014-09-01 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117825#comment-14117825 ] Sandy Ryza commented on SPARK-3179: --- Hi Michael, Happy to help review your code or

[jira] [Updated] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles

2014-09-01 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1239: -- Summary: Don't fetch all map output statuses at each reducer during shuffles (was: Don't fetch all map

[jira] [Created] (SPARK-3183) Add option for requesting full YARN cluster

2014-08-22 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3183: - Summary: Add option for requesting full YARN cluster Key: SPARK-3183 URL: https://issues.apache.org/jira/browse/SPARK-3183 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-08-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105126#comment-14105126 ] Sandy Ryza commented on SPARK-2978: --- So I started looking into this a little more and

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-08-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105128#comment-14105128 ] Sandy Ryza commented on SPARK-2978: --- [~jerryshao], if I understand correctly, ShuffleRDD

[jira] [Created] (SPARK-3172) Distinguish between shuffle spill on the map and reduce side

2014-08-21 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3172: - Summary: Distinguish between shuffle spill on the map and reduce side Key: SPARK-3172 URL: https://issues.apache.org/jira/browse/SPARK-3172 Project: Spark Issue

[jira] [Created] (SPARK-3174) Under YARN, add and remove executors based on load

2014-08-21 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3174: - Summary: Under YARN, add and remove executors based on load Key: SPARK-3174 URL: https://issues.apache.org/jira/browse/SPARK-3174 Project: Spark Issue Type:

[jira] [Created] (SPARK-3179) Add task OutputMetrics

2014-08-21 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3179: - Summary: Add task OutputMetrics Key: SPARK-3179 URL: https://issues.apache.org/jira/browse/SPARK-3179 Project: Spark Issue Type: Improvement Components:

[jira] [Commented] (SPARK-3019) Pluggable block transfer (data plane communication) interface

2014-08-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099888#comment-14099888 ] Sandy Ryza commented on SPARK-3019: --- I agree that it's not typically a problem, but I

[jira] [Commented] (SPARK-3019) Pluggable block transfer (data plane communication) interface

2014-08-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099983#comment-14099983 ] Sandy Ryza commented on SPARK-3019: --- Thanks for the info Mridul. A few extra

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-08-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099533#comment-14099533 ] Sandy Ryza commented on SPARK-2089: --- These customizations should only come from Hadoop

[jira] [Commented] (SPARK-3019) Pluggable block transfer (data plane communication) interface

2014-08-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099537#comment-14099537 ] Sandy Ryza commented on SPARK-3019: --- Just scanned this, so apologies if the answer is

[jira] [Created] (SPARK-3082) yarn.Client.logClusterResourceDetails throws NPE if YARN's getQueueInfo returns null

2014-08-16 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3082: - Summary: yarn.Client.logClusterResourceDetails throws NPE if YARN's getQueueInfo returns null Key: SPARK-3082 URL: https://issues.apache.org/jira/browse/SPARK-3082

[jira] [Updated] (SPARK-3082) yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist

2014-08-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3082: -- Summary: yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist (was:

[jira] [Commented] (SPARK-3028) sparkEventToJson should support SparkListenerExecutorMetricsUpdate

2014-08-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097193#comment-14097193 ] Sandy Ryza commented on SPARK-3028: --- +1 to what Patrick said. I'll post a patch along

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-08-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097826#comment-14097826 ] Sandy Ryza commented on SPARK-2089: --- H, it's true that my suggestion would require

[jira] [Comment Edited] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-08-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097826#comment-14097826 ] Sandy Ryza edited comment on SPARK-2089 at 8/14/14 10:41 PM: -

[jira] [Created] (SPARK-3052) Misleading and spurious FileSystem closed errors whenever a job fails while reading from Hadoop

2014-08-14 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3052: - Summary: Misleading and spurious FileSystem closed errors whenever a job fails while reading from Hadoop Key: SPARK-3052 URL: https://issues.apache.org/jira/browse/SPARK-3052

[jira] [Created] (SPARK-3053) Reconcile spark.files.userClassPathFirst with spark.yarn.user.classpath.first

2014-08-14 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3053: - Summary: Reconcile spark.files.userClassPathFirst with spark.yarn.user.classpath.first Key: SPARK-3053 URL: https://issues.apache.org/jira/browse/SPARK-3053 Project: Spark

[jira] [Created] (SPARK-3055) Stack trace logged in driver on job failure is usually uninformative

2014-08-14 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3055: - Summary: Stack trace logged in driver on job failure is usually uninformative Key: SPARK-3055 URL: https://issues.apache.org/jira/browse/SPARK-3055 Project: Spark

[jira] [Created] (SPARK-3014) Log a more informative message when yarn-cluster app fails because SparkContext wasn't initialized

2014-08-13 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3014: - Summary: Log a more informative message when yarn-cluster app fails because SparkContext wasn't initialized Key: SPARK-3014 URL: https://issues.apache.org/jira/browse/SPARK-3014

[jira] [Updated] (SPARK-3014) Log a more informative messages in a couple failure scenarios

2014-08-13 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3014: -- Summary: Log a more informative messages in a couple failure scenarios (was: Log a more informative

[jira] [Updated] (SPARK-3014) Log a more informative messages in a couple failure scenarios

2014-08-13 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3014: -- Description: This is what shows up currently when the user code fails to initialize a SparkContext

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-08-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094872#comment-14094872 ] Sandy Ryza commented on SPARK-2089: --- My opinion is that we should have a narrower API

[jira] [Created] (SPARK-2978) Provide an MR-style shuffle transformation

2014-08-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2978: - Summary: Provide an MR-style shuffle transformation Key: SPARK-2978 URL: https://issues.apache.org/jira/browse/SPARK-2978 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-2978) Provide an MR-style shuffle transformation

2014-08-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2978: -- Description: For Hive on Spark joins in particular, and for running legacy MR code in general, I

[jira] [Updated] (SPARK-2978) Provide an MR-style shuffle transformation

2014-08-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2978: -- Description: For Hive on Spark joins in particular, and for running legacy MR code in general, I

[jira] [Updated] (SPARK-2978) Provide an MR-style shuffle transformation

2014-08-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2978: -- Description: For Hive on Spark joins in particular, and for running legacy MR code in general, I

[jira] [Commented] (SPARK-2945) Allow specifying num of executors in the context configuration

2014-08-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091817#comment-14091817 ] Sandy Ryza commented on SPARK-2945: --- spark.executor.instances apparently isn't used for

[jira] [Commented] (SPARK-2926) Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle

2014-08-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091489#comment-14091489 ] Sandy Ryza commented on SPARK-2926: --- Hi Saisai, This seems like a very useful addition.

[jira] [Resolved] (SPARK-1683) Display filesystem read statistics with each task

2014-08-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-1683. --- Resolution: Fixed Display filesystem read statistics with each task

[jira] [Commented] (SPARK-1683) Display filesystem read statistics with each task

2014-08-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088900#comment-14088900 ] Sandy Ryza commented on SPARK-1683: --- https://github.com/apache/spark/pull/962 Display

[jira] [Created] (SPARK-2900) inputBytes aren't aggregated for stages like other task metrics

2014-08-07 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2900: - Summary: inputBytes aren't aggregated for stages like other task metrics Key: SPARK-2900 URL: https://issues.apache.org/jira/browse/SPARK-2900 Project: Spark

[jira] [Resolved] (SPARK-2564) ShuffleReadMetrics.totalBlocksFetched is redundant

2014-08-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2564. --- Resolution: Fixed Fix Version/s: 1.1.0 ShuffleReadMetrics.totalBlocksFetched is redundant

[jira] [Created] (SPARK-2894) spark-shell doesn't accept flags

2014-08-06 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2894: - Summary: spark-shell doesn't accept flags Key: SPARK-2894 URL: https://issues.apache.org/jira/browse/SPARK-2894 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-2819) Difficult to turn on intercept with linear models

2014-08-03 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2819: - Summary: Difficult to turn on intercept with linear models Key: SPARK-2819 URL: https://issues.apache.org/jira/browse/SPARK-2819 Project: Spark Issue Type:

[jira] [Created] (SPARK-2738) Remove redundant imports in BlockManagerSuite

2014-07-29 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2738: - Summary: Remove redundant imports in BlockManagerSuite Key: SPARK-2738 URL: https://issues.apache.org/jira/browse/SPARK-2738 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags

2014-07-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072925#comment-14072925 ] Sandy Ryza commented on SPARK-2664: --- I think the right behavior here is worth a little

[jira] [Comment Edited] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags

2014-07-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072925#comment-14072925 ] Sandy Ryza edited comment on SPARK-2664 at 7/24/14 7:18 AM: I

[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2014-07-22 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069880#comment-14069880 ] Sandy Ryza commented on SPARK-2421: --- It should be relatively straightforward to add a

[jira] [Created] (SPARK-2621) Update task InputMetrics incrementally

2014-07-22 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2621: - Summary: Update task InputMetrics incrementally Key: SPARK-2621 URL: https://issues.apache.org/jira/browse/SPARK-2621 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-2625) Fix ShuffleReadMetrics for NettyBlockFetcherIterator

2014-07-22 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2625: - Summary: Fix ShuffleReadMetrics for NettyBlockFetcherIterator Key: SPARK-2625 URL: https://issues.apache.org/jira/browse/SPARK-2625 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-2519) Eliminate pattern-matching on Tuple2 in performance-critical aggregation code

2014-07-20 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2519. --- Resolution: Fixed Fix Version/s: 1.1.0 Eliminate pattern-matching on Tuple2 in

[jira] [Created] (SPARK-2574) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

2014-07-18 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2574: - Summary: Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner Key: SPARK-2574 URL: https://issues.apache.org/jira/browse/SPARK-2574 Project: Spark

[jira] [Created] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2553: - Summary: CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key Key: SPARK-2553 URL: https://issues.apache.org/jira/browse/SPARK-2553 Project: Spark Issue

[jira] [Commented] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064709#comment-14064709 ] Sandy Ryza commented on SPARK-2553: --- https://github.com/apache/spark/pull/1461

[jira] [Created] (SPARK-2564) ShuffleReadMetrics.totalBlocksFetched is redundant

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2564: - Summary: ShuffleReadMetrics.totalBlocksFetched is redundant Key: SPARK-2564 URL: https://issues.apache.org/jira/browse/SPARK-2564 Project: Spark Issue Type:

[jira] [Created] (SPARK-2565) Update ShuffleReadMetrics as blocks are fetched

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2565: - Summary: Update ShuffleReadMetrics as blocks are fetched Key: SPARK-2565 URL: https://issues.apache.org/jira/browse/SPARK-2565 Project: Spark Issue Type:

[jira] [Created] (SPARK-2566) Update ShuffleWriteMetrics as data is written

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2566: - Summary: Update ShuffleWriteMetrics as data is written Key: SPARK-2566 URL: https://issues.apache.org/jira/browse/SPARK-2566 Project: Spark Issue Type:

[jira] [Commented] (SPARK-2564) ShuffleReadMetrics.totalBlocksFetched is redundant

2014-07-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065826#comment-14065826 ] Sandy Ryza commented on SPARK-2564: --- https://github.com/apache/spark/pull/1474

[jira] [Commented] (SPARK-2519) Eliminate pattern-matching on Tuple2 in performance-critical aggregation code

2014-07-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063251#comment-14063251 ] Sandy Ryza commented on SPARK-2519: --- https://github.com/apache/spark/pull/1435

[jira] [Commented] (SPARK-2519) Eliminate pattern-matching on Tuple2 in performance-critical aggregation code

2014-07-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064102#comment-14064102 ] Sandy Ryza commented on SPARK-2519: --- I looked in ShuffledRDD, ExternalAppendOnlyMap,

[jira] [Commented] (SPARK-2534) Avoid pulling in the entire RDD in groupByKey

2014-07-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064333#comment-14064333 ] Sandy Ryza commented on SPARK-2534: --- Yowza Avoid pulling in the entire RDD in

[jira] [Created] (SPARK-2461) Add a toString method to GeneralizedLinearModel

2014-07-12 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2461: - Summary: Add a toString method to GeneralizedLinearModel Key: SPARK-2461 URL: https://issues.apache.org/jira/browse/SPARK-2461 Project: Spark Issue Type:

[jira] [Created] (SPARK-2462) Make Vector.apply public

2014-07-12 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2462: - Summary: Make Vector.apply public Key: SPARK-2462 URL: https://issues.apache.org/jira/browse/SPARK-2462 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-2384) Add tooltips for shuffle write and scheduler delay in UI

2014-07-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053834#comment-14053834 ] Sandy Ryza commented on SPARK-2384: --- This is a great idea Add tooltips for shuffle

[jira] [Updated] (SPARK-2310) Support arbitrary options on the command line with spark-submit

2014-06-27 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2310: -- Summary: Support arbitrary options on the command line with spark-submit (was: Allow giving arbitrary

[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

2014-06-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041503#comment-14041503 ] Sandy Ryza commented on SPARK-1767: --- It will be in the Hadoop 2.5 release Prefer

[jira] [Updated] (SPARK-1675) Make clear whether computePrincipalComponents requires centered data

2014-06-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1675: -- Summary: Make clear whether computePrincipalComponents requires centered data (was: Make clear

[jira] [Commented] (SPARK-1675) Make clear whether computePrincipalComponents requires centered data

2014-06-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039952#comment-14039952 ] Sandy Ryza commented on SPARK-1675: --- I think it still wouldn't hurt to add a remark that

[jira] [Updated] (SPARK-1675) Make clear whether computePrincipalComponents requires centered data

2014-06-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1675: -- Priority: Trivial (was: Major) Make clear whether computePrincipalComponents requires centered data

[jira] [Reopened] (SPARK-1209) SparkHadoopUtil should not use package org.apache.hadoop

2014-06-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reopened SPARK-1209: --- It doesn't look like this was actually fixed. SparkHadoopUtil should not use package org.apache.hadoop

[jira] [Created] (SPARK-2149) [MLLIB] Kernel density estimation

2014-06-15 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2149: - Summary: [MLLIB] Kernel density estimation Key: SPARK-2149 URL: https://issues.apache.org/jira/browse/SPARK-2149 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-2149) [MLLIB] Univariate kernel density estimation

2014-06-15 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2149: -- Summary: [MLLIB] Univariate kernel density estimation (was: [MLLIB] Kernel density estimation)

[jira] [Commented] (SPARK-2149) [MLLIB] Univariate kernel density estimation

2014-06-15 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032079#comment-14032079 ] Sandy Ryza commented on SPARK-2149: --- https://github.com/apache/spark/pull/1093 [MLLIB]

[jira] [Created] (SPARK-2146) Fix the takeOrdered doc

2014-06-14 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2146: - Summary: Fix the takeOrdered doc Key: SPARK-2146 URL: https://issues.apache.org/jira/browse/SPARK-2146 Project: Spark Issue Type: Bug Affects Versions: 1.0.0

[jira] [Created] (SPARK-2142) Give better indicator of how GC cuts into task time

2014-06-13 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2142: - Summary: Give better indicator of how GC cuts into task time Key: SPARK-2142 URL: https://issues.apache.org/jira/browse/SPARK-2142 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-1954) Make it easier to get Spark on YARN code to compile in IntelliJ

2014-06-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-1954. --- Resolution: Duplicate Make it easier to get Spark on YARN code to compile in IntelliJ

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-06-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028937#comment-14028937 ] Sandy Ryza commented on SPARK-2089: --- I'll take this up. It seems like our options are:

[jira] [Created] (SPARK-2131) Collect per-task hdfs-bytes-written metrics

2014-06-12 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2131: - Summary: Collect per-task hdfs-bytes-written metrics Key: SPARK-2131 URL: https://issues.apache.org/jira/browse/SPARK-2131 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-2131) Collect per-task filesystem-bytes-read/written metrics

2014-06-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2131: -- Summary: Collect per-task filesystem-bytes-read/written metrics (was: Collect per-task

[jira] [Updated] (SPARK-2131) Collect per-task filesystem-bytes-written metrics

2014-06-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2131: -- Summary: Collect per-task filesystem-bytes-written metrics (was: Collect per-task hdfs-bytes-written

[jira] [Created] (SPARK-2114) Aggregations on raw data

2014-06-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2114: - Summary: Aggregations on raw data Key: SPARK-2114 URL: https://issues.apache.org/jira/browse/SPARK-2114 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-2114) Aggregations on raw data

2014-06-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2114: -- Description: For groupByKey and join transformations, Spark tasks on the reduce side deserialize

[jira] [Updated] (SPARK-2114) groupByKey and joins on raw data

2014-06-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2114: -- Summary: groupByKey and joins on raw data (was: Aggregations on raw data) groupByKey and joins on

[jira] [Commented] (SPARK-2099) Report metrics for running tasks

2014-06-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028681#comment-14028681 ] Sandy Ryza commented on SPARK-2099: --- https://github.com/apache/spark/pull/1056 Report

[jira] [Created] (SPARK-2099) Report metrics for running tasks

2014-06-10 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2099: - Summary: Report metrics for running tasks Key: SPARK-2099 URL: https://issues.apache.org/jira/browse/SPARK-2099 Project: Spark Issue Type: Improvement Affects

[jira] [Updated] (SPARK-2099) Report metrics for running tasks

2014-06-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2099: -- Description: Spark currently collects a set of helpful task metrics, like shuffle bytes written, GC

[jira] [Created] (SPARK-2084) Mention SPARK_JAR in env var section on configuration page

2014-06-09 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2084: - Summary: Mention SPARK_JAR in env var section on configuration page Key: SPARK-2084 URL: https://issues.apache.org/jira/browse/SPARK-2084 Project: Spark Issue

<    1   2   3   4   5   >