[GitHub] spark pull request: [SPARK-2568] RangePartitioner should run only ...

2014-07-27 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1562#discussion_r15439847 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -105,24 +108,91 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1360#issuecomment-50281668 ping Probably too late for a 1.0.2-rc, but this should go into 1.0.3 and 1.1.0. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [Build] SPARK-2614: (2nd patch) Create a spark...

2014-07-27 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1611#discussion_r15442657 --- Diff: assembly/src/deb/control/examples/control --- @@ -0,0 +1,8 @@ +Package: [[deb.pkg.name]]-examples +Version: [[version]]-[[buildNumber

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1607#discussion_r15437121 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -110,42 +110,56 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-25 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50160338 After installing `hub` you can also do a bunch of new stuff on the command line, including `hub checkout https://github.com/apache/spark/pull/1499` https

[GitHub] spark pull request: [SPARK-2647] DAGScheduler plugs other JobSubmi...

2014-07-25 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1548#issuecomment-50174114 @YanTangZhai If you are searching for another solution and abandoning this PR, could you please close this PR and open a new one when you have something different

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-25 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1561#issuecomment-50201015 @JoshRosen Why a trait instead of an abstract class? We're not expecting to need to mixin Stage outside of the Stage class hierarchy, right? --- If your project

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331775 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -355,14 +351,13 @@ class DAGScheduler( logDebug

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331819 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -22,6 +22,8 @@ import org.apache.spark.rdd.RDD import

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15332026 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -56,6 +58,16 @@ private[spark] class Stage( val numPartitions

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15334407 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15356652 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -341,8 +336,9 @@ class DAGScheduler

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15361883 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -341,8 +336,9 @@ class DAGScheduler

[GitHub] spark pull request: SPARK-1715: Ensure actor is self-contained in ...

2014-07-24 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/637#issuecomment-50057749 This rebases cleanly on top of https://github.com/apache/spark/pull/1561, so let's get that one in first. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2529] Clean closures in foreach and for...

2014-07-24 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1583#issuecomment-50105915 Does anyone recall why we lost the closure cleaning in https://github.com/apache/spark/commit/6b288b75d4c05f42ad3612813dc77ff824bb6203 ? --- If your project is set

[GitHub] spark pull request: [SPARK-2635] Fix race condition at SchedulerBa...

2014-07-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15272634 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -268,14 +264,18 @@ class

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49848091 This looks like a clean implementation, but you still need to open a JIRA issue to explain why you want this; then edit the description of this PR to reference

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49848309 Sorry, looks like you already have SPARK-2618, so change change the title of this PR to include that. --- If your project is set up for it, you can reply

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-49874874 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2647] DAGScheduler plugs other JobSubmi...

2014-07-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1548#discussion_r15289573 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1202,8 +1202,12 @@ private[scheduler] class

[GitHub] spark pull request: [SPARK-2567] Resubmitted stage sometimes remai...

2014-07-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1516#issuecomment-49911032 This appears to be a reversion of d58502a1562bbfb1bb4e517ebcc8239efd639297 while ignoring and misapplying the comment regarding ordering (which I'm not completely

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49916553 Yeah, I'm wondering whether the actual problem is that creation and use of scheduler pools with different weights is unclear or too difficult; and that if we could

[GitHub] spark pull request: Removed some HashMaps from DAGScheduler by sto...

2014-07-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15327897 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -56,6 +58,16 @@ private[spark] class Stage( val numPartitions

[GitHub] spark pull request: Removed some HashMaps from DAGScheduler by sto...

2014-07-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15328234 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -315,13 +309,14 @@ class DAGScheduler( */ private def

[GitHub] spark pull request: Removed some HashMaps from DAGScheduler by sto...

2014-07-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15328329 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -315,13 +309,14 @@ class DAGScheduler( */ private def

[GitHub] spark pull request: Removed some HashMaps from DAGScheduler by sto...

2014-07-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15328468 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -341,8 +336,9 @@ class DAGScheduler

[GitHub] spark pull request: Removed some HashMaps from DAGScheduler by sto...

2014-07-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15328544 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -341,8 +336,9 @@ class DAGScheduler

[GitHub] spark pull request: Removed some HashMaps from DAGScheduler by sto...

2014-07-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15328773 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -992,13 +974,14 @@ class DAGScheduler( } private

[GitHub] spark pull request: Removed some HashMaps from DAGScheduler by sto...

2014-07-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1561#issuecomment-49967238 JIRA? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

2014-07-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1566#issuecomment-49968268 Makes sense. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15240513 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class

[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15242266 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -108,4 +108,8 @@ private[spark] class

[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15242315 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -108,4 +108,8 @@ private[spark] class

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-22 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49773532 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1528#discussion_r15246213 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -778,8 +778,10 @@ class DAGScheduler( logInfo(Submitting

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1528#discussion_r15248491 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulingAlgorithm.scala --- @@ -32,11 +32,21 @@ private[spark] class FIFOSchedulingAlgorithm

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1535#discussion_r15258957 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1294,7 +1307,11 @@ abstract class RDD[T: ClassTag]( val partitionStr

[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15268935 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1528#discussion_r15270158 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulingAlgorithm.scala --- @@ -17,6 +17,8 @@ package org.apache.spark.scheduler

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1528#discussion_r15270239 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSet.scala --- @@ -27,9 +27,18 @@ private[spark] class TaskSet( val tasks: Array[Task

[GitHub] spark pull request: [SPARK-2635] Fix race condition at SchedulerBa...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15270679 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class

[GitHub] spark pull request: [SPARK-2635] Fix race condition at SchedulerBa...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15270909 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1528#discussion_r15272274 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSet.scala --- @@ -27,9 +27,18 @@ private[spark] class TaskSet( val tasks: Array[Task

[GitHub] spark pull request: [SPARK-695] In DAGScheduler's getPreferredLocs...

2014-07-17 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1362#discussion_r15076386 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1107,7 +1106,6 @@ class DAGScheduler( case shufDep

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49174657 Hmmm... not sure that I would go so far as to call it nice. This does make the code slightly more difficult to read and understand, so can we hope that you've got

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49195389 Got it. Thanks. That also helps to put some bound (for now) on where we will make such performance optimizations. --- If your project is set up for it, you can

[GitHub] spark pull request: Async in progress

2014-07-16 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1449#issuecomment-49241702 Please create a JIRA issue and a description for this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-695] In DAGScheduler's getPreferredLocs...

2014-07-16 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1362#discussion_r15041791 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1107,7 +1106,6 @@ class DAGScheduler( case shufDep

[GitHub] spark pull request: [WIP][SPARK-2054][SQL] Code Generation for Exp...

2014-07-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r14845258 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,421

[GitHub] spark pull request: [WIP][SPARK-2054][SQL] Code Generation for Exp...

2014-07-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r14845309 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,421

[GitHub] spark pull request: [WIP][SPARK-2054][SQL] Code Generation for Exp...

2014-07-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r14846043 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,421

[GitHub] spark pull request: [WIP][SPARK-2054][SQL] Code Generation for Exp...

2014-07-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r14846216 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,421

[GitHub] spark pull request: [WIP][SPARK-2054][SQL] Code Generation for Exp...

2014-07-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r14847035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,421

[GitHub] spark pull request: [WIP][SPARK-2054][SQL] Code Generation for Exp...

2014-07-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r14848366 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,421

[GitHub] spark pull request: [WIP][SPARK-2054][SQL] Code Generation for Exp...

2014-07-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r14849312 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala --- @@ -0,0 +1,83

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-10 Thread markhamstra
GitHub user markhamstra opened a pull request: https://github.com/apache/spark/pull/1360 SPARK-2425 Don't kill a still-running Application because of some misbehaving Executors Introduces a LOADING - RUNNING ApplicationState transition and prevents Master from removing

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-07-04 Thread markhamstra
Github user markhamstra closed the pull request at: https://github.com/apache/spark/pull/686 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [MLLIB] SPARK-2329 Add multi-label evaluation ...

2014-07-01 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1270#discussion_r14428640 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala --- @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-06-25 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r14211667 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1062,10 +1062,15 @@ class DAGScheduler

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-06-20 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/686#issuecomment-46684950 ping: This should go into 1.0.1 @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-06-20 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r14032954 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -313,6 +314,47 @@ class DAGSchedulerSuite extends TestKit

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-06-20 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r14040631 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1062,10 +1062,15 @@ class DAGScheduler

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-06-20 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r14041296 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1062,10 +1062,15 @@ class DAGScheduler

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-06-20 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r14041700 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1062,10 +1062,15 @@ class DAGScheduler

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-18 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46436035 Hmmm, that doesn't precisely match my recollection or understanding. Certainly we discussed that alpha components aren't required to maintain a stable API, but I

[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code

2014-06-18 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1115#issuecomment-46479466 Yes, this PR is not in a useful state right now. It's hard to even find the proposed changes because of all the clutter of unnecessary commits, but it looks to me

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46389597 Is that the basic strategy we are going to use with AlphaComponents -- merging new APIs at both the minor and maintenance levels? I don't know that I have any

[GitHub] spark pull request: SPARK-2158 Clean up core/stdout file from File...

2014-06-16 Thread markhamstra
GitHub user markhamstra opened a pull request: https://github.com/apache/spark/pull/1100 SPARK-2158 Clean up core/stdout file from FileAppenderSuite @tdas You can merge this pull request into a Git repository by running: $ git pull https://github.com/markhamstra/spark SPARK

[GitHub] spark pull request: SPARK-2158 Clean up core/stdout file from File...

2014-06-16 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1100#issuecomment-46265806 jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1715: Ensure actor is self-contained in ...

2014-06-14 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/637#issuecomment-46104906 It means that the check for binary compatibility after your patch is applied has failed because the checker thinks that there previously was a default/automatic

[GitHub] spark pull request: [SQL] Update SparkSQL and ScalaTest in branch-...

2014-06-13 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1078#issuecomment-46071589 FYI Bumping all the way to the current scalatest 2.2.0 also works. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-06-04 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/686#issuecomment-45113730 @rxin merge to 1.0.1 and 1.1.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/369#discussion_r13419732 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag]( def ++(other: RDD[T]): RDD[T

[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/369#discussion_r13423508 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag]( def ++(other: RDD[T]): RDD[T

[GitHub] spark pull request: [WIP] update breeze to version 0.8.1

2014-06-02 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-44861308 What is the reason for this change, and how does it affect our intention to maintain binary compatibility? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [WIP] update breeze to version 0.8.1

2014-06-02 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-44889156 We shouldn't, so I think we maintain source compatibility without any trouble. Are the MIMA checks good enough to catch binary incompatibility when we make

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-06-02 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-44914945 Neither does spark 1.0.0. We've offered no guarantee that any spark 1.x will work with scala 2.11. If it turns out that we can't cross-compile for scala 2.10

[GitHub] spark pull request: [SPARK-1938] [SQL] ApproxCountDistinctMergeFun...

2014-05-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/893#issuecomment-44309000 @ash211 Makes sense to me -- which doesn't necessarily mean a lot in this unfamiliar area of the code It looks to me like the dataType for each of CountDistinct

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-27 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r13106282 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1055,10 +1055,16 @@ class DAGScheduler

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-05-20 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/813#issuecomment-43663175 To throw another wrench into the Union analogy, there is also the little-used SparkContext#union, which has signatures for both Seq[RDD[T]] and varags RDD[T

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

2014-05-15 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/639#issuecomment-42709359 @aarondav Your test code is similar to mine, as are your conclusions. Somebody really needs to do a systematic inventory of Akka exception handling throughout Spark

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-15 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r12409225 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1055,10 +1055,16 @@ class DAGScheduler

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-15 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r12498098 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1148,7 +1154,11 @@ private[scheduler] class

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-15 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/686#issuecomment-42495440 The INFO log should include the information that tasks were not cancelled. Where/how else do you want to see notification of those facts? Is adding more Listener

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-15 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r12497895 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1148,7 +1154,11 @@ private[scheduler] class

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-15 Thread markhamstra
GitHub user markhamstra opened a pull request: https://github.com/apache/spark/pull/686 [SPARK-1749] Job cancellation when SchedulerBackend does not implement killTask It turns out that having the DAGScheduler tell the taskScheduler to cancelTasks when the backend does

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

2014-05-14 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/639#issuecomment-42695854 @aarondav That's what I thought too when looking at the Akka code, and that's why I closed the JIRA for a while. After writing some test code, though, it really

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-14 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/686#issuecomment-42496521 If interruptThread is not `true`, then we are going to leave tasks running on the cluster after cancellation with other backends as well. This is definitely an issue

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-14 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r12408395 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1148,7 +1154,11 @@ private[scheduler] class

[GitHub] spark pull request: [SPARK-1620] Handle uncaught exceptions in fun...

2014-05-12 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/622#issuecomment-42861387 Yes, I'll do a little refactoring after https://github.com/apache/spark/pull/715 is merged. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/369#issuecomment-42905001 I'll say it again: When you are working on preparing a PR, you're better off rebasing than merging. within a clone of your github repo: git pull

[GitHub] spark pull request: [WIP] Simplify the build with sbt 0.13.2 featu...

2014-05-11 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/706#issuecomment-42697568 https://issues.apache.org/jira/browse/SPARK-1776 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-11 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/686#issuecomment-42494200 @CodingCat @kayousterhout @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [WIP] Simplify the build with sbt 0.13.2 featu...

2014-05-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/706#discussion_r12509377 --- Diff: project/SparkBuild.scala --- @@ -297,7 +273,7 @@ object SparkBuild extends Build { val chillVersion = 0.3.6 val

[GitHub] spark pull request: [WIP] Simplify the build with sbt 0.13.2 featu...

2014-05-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/706#discussion_r12489193 --- Diff: project/SparkBuild.scala --- @@ -16,17 +16,18 @@ */ import sbt._ -import sbt.Classpaths.publishTask -import sbt.Keys

[GitHub] spark pull request: [SPARK-1620] Handle uncaught exceptions in fun...

2014-05-06 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/622#issuecomment-42354082 Yes, try/catch is certainly a viable option, and something I considered before opting to begin the discussion with the slightly less verbose approach. I'll

[GitHub] spark pull request: SPARK-1715: Ensure actor is self-contained in ...

2014-05-05 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/637#discussion_r12290293 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -912,13 +899,13 @@ class DAGScheduler

[GitHub] spark pull request: SPARK-1715: Ensure actor is self-contained in ...

2014-05-05 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/637#discussion_r12299198 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -300,13 +300,7 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-1620] Handle uncaught exceptions in fun...

2014-05-05 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/622#discussion_r12304123 --- Diff: core/src/main/scala/org/apache/spark/util/UncaughtExceptionHandler.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

2014-05-04 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/639#issuecomment-42154867 Okay, first, I didn't start work on this JIRA myself because I haven't had the time to come to a complete understanding of the current and intended functionality, nor

<    1   2   3   4   5   6   7   >