[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236131#comment-14236131 ] Sandy Ryza commented on SPARK-3655: --- foldLeft only conceptually makes sense when applied

[jira] [Created] (SPARK-4770) spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN

2014-12-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4770: - Summary: spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN Key: SPARK-4770 URL: https://issues.apache.org/jira/browse/SPARK-4770 Project:

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235795#comment-14235795 ] Sandy Ryza commented on SPARK-3655: --- The repartitionAndSortWithinPartitions approach see

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235758#comment-14235758 ] Sandy Ryza commented on SPARK-3655: --- Hey [~koert], I think the transform that would most

[jira] [Commented] (SPARK-4687) SparkContext#addFile doesn't keep file folder information

2014-12-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233385#comment-14233385 ] Sandy Ryza commented on SPARK-4687: --- [~pwendell], do you think this is a reasonable API

[jira] [Created] (SPARK-4716) Avoid shuffle when all-to-all operation has single input and output partition

2014-12-03 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4716: - Summary: Avoid shuffle when all-to-all operation has single input and output partition Key: SPARK-4716 URL: https://issues.apache.org/jira/browse/SPARK-4716 Project: Spark

[jira] [Commented] (SPARK-4630) Dynamically determine optimal number of partitions

2014-11-30 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229371#comment-14229371 ] Sandy Ryza commented on SPARK-4630: --- Hey [~pwendell], Spark deals much better with large

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226969#comment-14226969 ] Sandy Ryza commented on SPARK-4452: --- Thinking about the current change a little more, an

[jira] [Updated] (SPARK-4630) Dynamically determine optimal number of partitions

2014-11-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4630: -- Assignee: Kostas Sakellis > Dynamically determine optimal number of partitions > ---

[jira] [Commented] (SPARK-4628) Put all external projects behind a build flag

2014-11-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226722#comment-14226722 ] Sandy Ryza commented on SPARK-4628: --- This looks like a duplicate of SPARK-4376. Resolvi

[jira] [Resolved] (SPARK-4376) Put external modules behind build profiles

2014-11-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-4376. --- Resolution: Duplicate > Put external modules behind build profiles > -

[jira] [Created] (SPARK-4617) Fix spark.yarn.applicationMaster.waitTries doc

2014-11-25 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4617: - Summary: Fix spark.yarn.applicationMaster.waitTries doc Key: SPARK-4617 URL: https://issues.apache.org/jira/browse/SPARK-4617 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4584: -- Assignee: Marcelo Vanzin (was: Sandy Ryza) > 2x Performance regression for Spark-on-YARN >

[jira] [Assigned] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-4584: - Assignee: Sandy Ryza > 2x Performance regression for Spark-on-YARN >

[jira] [Commented] (SPARK-4585) Spark dynamic scaling executors use upper limit value as default.

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224257#comment-14224257 ] Sandy Ryza commented on SPARK-4585: --- I was discussing this with [~brocknoland]. The iss

[jira] [Updated] (SPARK-4585) Spark dynamic scaling executors use upper limit value as default.

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4585: -- Issue Type: Improvement (was: Bug) > Spark dynamic scaling executors use upper limit value as default.

[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224185#comment-14224185 ] Sandy Ryza commented on SPARK-4584: --- I took a look at the jobs Nishkam ran before and af

[jira] [Assigned] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-4447: - Assignee: Sandy Ryza (was: Patrick Wendell) > Remove layers of abstraction in YARN code no longe

[jira] [Assigned] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-4447: - Assignee: Patrick Wendell > Remove layers of abstraction in YARN code no longer needed after drop

[jira] [Updated] (SPARK-4352) Incorporate locality preferences in dynamic allocation requests

2014-11-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4352: -- Description: Currently, achieving data locality in Spark is difficult unless an application takes resou

[jira] [Updated] (SPARK-4352) Incorporate locality preferences in dynamic allocation requests

2014-11-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4352: -- Description: Currently, achieving data locality in Spark is difficult u preferredNodeLocalityData provi

[jira] [Created] (SPARK-4569) Rename "externalSorting" in Aggregator

2014-11-23 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4569: - Summary: Rename "externalSorting" in Aggregator Key: SPARK-4569 URL: https://issues.apache.org/jira/browse/SPARK-4569 Project: Spark Issue Type: Bug Comp

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222572#comment-14222572 ] Sandy Ryza commented on SPARK-4452: --- [~tianshuo], I took a look at the patch, and the ge

[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2014-11-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221710#comment-14221710 ] Sandy Ryza commented on SPARK-4550: --- We don't, though it would allow us to be much more

[jira] [Updated] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2014-11-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4550: -- Summary: In sort-based shuffle, store map outputs in serialized form (was: In sort-based shuffle, store

[jira] [Created] (SPARK-4550) In sort-based shuffle, store map outputs as serialized

2014-11-21 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4550: - Summary: In sort-based shuffle, store map outputs as serialized Key: SPARK-4550 URL: https://issues.apache.org/jira/browse/SPARK-4550 Project: Spark Issue Type: Im

[jira] [Commented] (SPARK-1956) Enable shuffle consolidation by default

2014-11-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221453#comment-14221453 ] Sandy Ryza commented on SPARK-1956: --- This is of smaller importance now that sort-based s

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-11-20 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220620#comment-14220620 ] Sandy Ryza commented on SPARK-2089: --- Another possible solution here is SPARK-4352. Requ

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217340#comment-14217340 ] Sandy Ryza commented on SPARK-4452: --- [~matei] my point is not that forced spilling allow

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216933#comment-14216933 ] Sandy Ryza commented on SPARK-4452: --- One issue with a limits-by-object approach is that

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215448#comment-14215448 ] Sandy Ryza commented on SPARK-4452: --- Ah, true. > Shuffle data structures can starve oth

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215436#comment-14215436 ] Sandy Ryza commented on SPARK-4452: --- [~andrewor14], IIUC, (2) shouldn't happen in hash-b

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215269#comment-14215269 ] Sandy Ryza commented on SPARK-4452: --- Updated the title to reflect the specific problem.

[jira] [Updated] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4452: -- Summary: Shuffle data structures can starve others on the same thread for memory (was: Enhance Sort-ba

[jira] [Created] (SPARK-4457) Document how to build for Hadoop versions greater than 2.4

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4457: - Summary: Document how to build for Hadoop versions greater than 2.4 Key: SPARK-4457 URL: https://issues.apache.org/jira/browse/SPARK-4457 Project: Spark Issue Type

[jira] [Created] (SPARK-4456) Document why spilling depends on both elements read and memory used

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4456: - Summary: Document why spilling depends on both elements read and memory used Key: SPARK-4456 URL: https://issues.apache.org/jira/browse/SPARK-4456 Project: Spark

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215047#comment-14215047 ] Sandy Ryza commented on SPARK-4452: --- A third possible fix would be to have the shuffle m

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4452: -- Affects Version/s: 1.1.0 > Enhance Sort-based Shuffle to avoid spilling small files > --

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215018#comment-14215018 ] Sandy Ryza commented on SPARK-4452: --- I haven't thought the implications out fully, but i

[jira] [Commented] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214411#comment-14214411 ] Sandy Ryza commented on SPARK-4447: --- Planning to work on this. > Remove layers of abstr

[jira] [Updated] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4447: -- Description: For example, YarnRMClient and YarnRMClientImpl can be merged YarnAllocator and YarnAllocati

[jira] [Created] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4447: - Summary: Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha Key: SPARK-4447 URL: https://issues.apache.org/jira/browse/SPARK-4447 Project:

[jira] [Commented] (SPARK-2819) Difficult to turn on intercept with linear models

2014-11-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212854#comment-14212854 ] Sandy Ryza commented on SPARK-2819: --- Yes, as far as I can tell there are still no public

[jira] [Commented] (SPARK-4375) Assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209388#comment-14209388 ] Sandy Ryza commented on SPARK-4375: --- This all makes sense to me. Will put up a patch.

[jira] [Commented] (SPARK-4375) Assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209217#comment-14209217 ] Sandy Ryza commented on SPARK-4375: --- The issue here is that the activeByDefault Maven op

[jira] [Updated] (SPARK-4375) Assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4375: -- Summary: Assembly built with Maven is missing most of repl classes (was: assembly built with Maven is m

[jira] [Updated] (SPARK-4375) assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4375: -- Description: In particular, the ones in the split scala-2.10/scala-2.11 directories aren't being added

[jira] [Created] (SPARK-4375) assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4375: - Summary: assembly built with Maven is missing most of repl classes Key: SPARK-4375 URL: https://issues.apache.org/jira/browse/SPARK-4375 Project: Spark Issue Type:

[jira] [Created] (SPARK-4352) Incorporate locality preferences in dynamic allocation requests

2014-11-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4352: - Summary: Incorporate locality preferences in dynamic allocation requests Key: SPARK-4352 URL: https://issues.apache.org/jira/browse/SPARK-4352 Project: Spark Issu

[jira] [Created] (SPARK-4338) Remove yarn-alpha support

2014-11-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4338: - Summary: Remove yarn-alpha support Key: SPARK-4338 URL: https://issues.apache.org/jira/browse/SPARK-4338 Project: Spark Issue Type: Sub-task Components:

[jira] [Commented] (SPARK-4338) Remove yarn-alpha support

2014-11-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206104#comment-14206104 ] Sandy Ryza commented on SPARK-4338: --- Planning to take a stab at this > Remove yarn-alph

[jira] [Created] (SPARK-4337) Add ability to cancel pending requests to YARN

2014-11-10 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4337: - Summary: Add ability to cancel pending requests to YARN Key: SPARK-4337 URL: https://issues.apache.org/jira/browse/SPARK-4337 Project: Spark Issue Type: Improvemen

[jira] [Commented] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205346#comment-14205346 ] Sandy Ryza commented on SPARK-4290: --- SparkFiles.get needs to be called, but it will only

[jira] [Commented] (SPARK-4280) In dynamic allocation, add option to never kill executors with cached blocks

2014-11-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202382#comment-14202382 ] Sandy Ryza commented on SPARK-4280: --- So it looks like the block IDs of broadcast variabl

[jira] [Commented] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later

2014-11-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202319#comment-14202319 ] Sandy Ryza commented on SPARK-4267: --- Strange. Checked in the code and it seems like thi

[jira] [Commented] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201572#comment-14201572 ] Sandy Ryza commented on SPARK-4290: --- If you call SparkContext#addFile, the file will be

[jira] [Commented] (SPARK-4280) In dynamic allocation, add option to never kill executors with cached blocks

2014-11-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200801#comment-14200801 ] Sandy Ryza commented on SPARK-4280: --- My thinking was that it would just be based on whet

[jira] [Created] (SPARK-4280) In dynamic allocation, add option to never kill executors with cached blocks

2014-11-06 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4280: - Summary: In dynamic allocation, add option to never kill executors with cached blocks Key: SPARK-4280 URL: https://issues.apache.org/jira/browse/SPARK-4280 Project: Spark

[jira] [Updated] (SPARK-4230) Doc for spark.default.parallelism is incorrect

2014-11-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4230: -- Description: The default default parallelism for shuffle transformations is actually the maximum number

[jira] [Created] (SPARK-4230) Doc for spark.default.parallelism is incorrect

2014-11-04 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4230: - Summary: Doc for spark.default.parallelism is incorrect Key: SPARK-4230 URL: https://issues.apache.org/jira/browse/SPARK-4230 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-4227) Document external shuffle service

2014-11-04 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4227: - Summary: Document external shuffle service Key: SPARK-4227 URL: https://issues.apache.org/jira/browse/SPARK-4227 Project: Spark Issue Type: Improvement C

[jira] [Comment Edited] (SPARK-4214) With dynamic allocation, avoid outstanding requests for more executors than pending tasks need

2014-11-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196464#comment-14196464 ] Sandy Ryza edited comment on SPARK-4214 at 11/4/14 6:00 PM: We

[jira] [Comment Edited] (SPARK-4214) With dynamic allocation, avoid outstanding requests for more executors than pending tasks need

2014-11-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196464#comment-14196464 ] Sandy Ryza edited comment on SPARK-4214 at 11/4/14 6:00 PM: We

[jira] [Commented] (SPARK-4214) With dynamic allocation, avoid outstanding requests for more executors than pending tasks need

2014-11-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196464#comment-14196464 ] Sandy Ryza commented on SPARK-4214: --- We can implement this in either a "weak" way or a "

[jira] [Created] (SPARK-4214) With dynamic allocation, avoid outstanding requests for more executors than pending tasks need

2014-11-03 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4214: - Summary: With dynamic allocation, avoid outstanding requests for more executors than pending tasks need Key: SPARK-4214 URL: https://issues.apache.org/jira/browse/SPARK-4214

[jira] [Commented] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192773#comment-14192773 ] Sandy Ryza commented on SPARK-4178: --- Thanks [~kostas] for noticing this. > Hadoop input

[jira] [Created] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation

2014-10-31 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4178: - Summary: Hadoop input metrics ignore bytes read in RecordReader instantiation Key: SPARK-4178 URL: https://issues.apache.org/jira/browse/SPARK-4178 Project: Spark

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192609#comment-14192609 ] Sandy Ryza commented on SPARK-4016: --- Also, it looks like this can cause an exception: SP

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192604#comment-14192604 ] Sandy Ryza commented on SPARK-4016: --- It looks like after this change, stage-level summar

[jira] [Created] (SPARK-4175) Exception on stage page

2014-10-31 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4175: - Summary: Exception on stage page Key: SPARK-4175 URL: https://issues.apache.org/jira/browse/SPARK-4175 Project: Spark Issue Type: Bug Affects Versions: 1.2.0

[jira] [Created] (SPARK-4136) Under dynamic allocation, cancel outstanding executor requests when pending task queue is empty

2014-10-29 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4136: - Summary: Under dynamic allocation, cancel outstanding executor requests when pending task queue is empty Key: SPARK-4136 URL: https://issues.apache.org/jira/browse/SPARK-4136

[jira] [Commented] (SPARK-3461) Support external groupByKey using repartitionAndSortWithinPartitions

2014-10-28 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186562#comment-14186562 ] Sandy Ryza commented on SPARK-3461: --- SPARK-2926 could help with this as well. > Support

[jira] [Commented] (SPARK-1856) Standardize MLlib interfaces

2014-10-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183174#comment-14183174 ] Sandy Ryza commented on SPARK-1856: --- Is this work still targeted for 1.2? > Standardize

[jira] [Commented] (SPARK-3573) Dataset

2014-10-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183173#comment-14183173 ] Sandy Ryza commented on SPARK-3573: --- Is this still targeted for 1.2? > Dataset > --

[jira] [Commented] (SPARK-2926) Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle

2014-10-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179631#comment-14179631 ] Sandy Ryza commented on SPARK-2926: --- [~rxin] did you ever get a chance to try this out?

[jira] [Commented] (SPARK-3360) Add RowMatrix.multiply(Vector)

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171238#comment-14171238 ] Sandy Ryza commented on SPARK-3360: --- bq. You don't need Vector.multiply(RowMatrix) reall

[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171024#comment-14171024 ] Sandy Ryza edited comment on SPARK-3174 at 10/14/14 3:03 PM: -

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171024#comment-14171024 ] Sandy Ryza commented on SPARK-3174: --- bq. If I understand correctly, your concern with re

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-13 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169931#comment-14169931 ] Sandy Ryza commented on SPARK-3174: --- bq. Slow-start is actually not slow at all if we lo

[jira] [Commented] (SPARK-1209) SparkHadoopUtil should not use package org.apache.hadoop

2014-10-13 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169846#comment-14169846 ] Sandy Ryza commented on SPARK-1209: --- Definitely worth changing, in my opinion. This has

[jira] [Commented] (SPARK-3884) If deploy mode is cluster, --driver-memory shouldn't apply to client JVM

2014-10-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165776#comment-14165776 ] Sandy Ryza commented on SPARK-3884: --- Accidentally assigned this to myself, but others sh

[jira] [Updated] (SPARK-3884) If deploy mode is cluster, --driver-memory shouldn't apply to client JVM

2014-10-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3884: -- Summary: If deploy mode is cluster, --driver-memory shouldn't apply to client JVM (was: Don't set SPARK

[jira] [Created] (SPARK-3884) Don't set SPARK_SUBMIT_DRIVER_MEMORY if deploy mode is cluster

2014-10-09 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3884: - Summary: Don't set SPARK_SUBMIT_DRIVER_MEMORY if deploy mode is cluster Key: SPARK-3884 URL: https://issues.apache.org/jira/browse/SPARK-3884 Project: Spark Issue

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162685#comment-14162685 ] Sandy Ryza commented on SPARK-3174: --- bq. Maybe it makes sense to just call it `spark.dy

[jira] [Updated] (SPARK-3682) Add helpful warnings to the UI

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3682: -- Attachment: SPARK-3682Design.pdf Posting an initial design > Add helpful warnings to the UI > -

[jira] [Created] (SPARK-3837) Warn when YARN is killing containers for exceeding memory limits

2014-10-07 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3837: - Summary: Warn when YARN is killing containers for exceeding memory limits Key: SPARK-3837 URL: https://issues.apache.org/jira/browse/SPARK-3837 Project: Spark Iss

[jira] [Commented] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161636#comment-14161636 ] Sandy Ryza commented on SPARK-3797: --- Not necessarily opposed to this, but wanted to brin

[jira] [Updated] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Summary: Run the shuffle service inside the YARN NodeManager as an AuxiliaryService (was: Enable runnin

[jira] [Updated] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: It's also worth considering running the shuffle service in a YARN container beside the exec

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161048#comment-14161048 ] Sandy Ryza commented on SPARK-3174: --- Ah, misread. My opinion is that, for a first cut w

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160960#comment-14160960 ] Sandy Ryza commented on SPARK-3174: --- bq. for instance, lets say I do some ETL stuff wher

[jira] [Updated] (SPARK-3797) Enable running shuffle service in separate process from executor

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: This could either mean * Running the shuffle service inside the YARN NodeManager as an Auxi

[jira] [Updated] (SPARK-3797) Enable running shuffle service in separate process from executor

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: This could either mean * Running the shuffle service inside the YARN NodeManager as an auxi

[jira] [Updated] (SPARK-3797) Enable running shuffle service in separate process from executor

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Summary: Enable running shuffle service in separate process from executor (was: Integrate shuffle servi

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160937#comment-14160937 ] Sandy Ryza commented on SPARK-3174: --- Thanks for posting the detailed design, Andrew. A

[jira] [Comment Edited] (SPARK-3464) Graceful decommission of executors

2014-10-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159252#comment-14159252 ] Sandy Ryza edited comment on SPARK-3464 at 10/4/14 7:27 PM: [~

[jira] [Commented] (SPARK-3464) Graceful decommission of executors

2014-10-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159252#comment-14159252 ] Sandy Ryza commented on SPARK-3464: --- Did you mean to resolve this as "Fixed"? > Gracefu

[jira] [Comment Edited] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158571#comment-14158571 ] Sandy Ryza edited comment on SPARK-3561 at 10/3/14 11:00 PM: -

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3561: -- Description: Currently Spark's user-facing API is tightly coupled with its backend execution engine.

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3561: -- Description: Currently Spark's API is tightly coupled with its backend execution engine. It could be

<    1   2   3   4   5   6   >