[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-27 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2087 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r19419188 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -224,18 +223,18 @@ class HadoopRDD[K, V]( val key: K = reader.createKey()

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r19387877 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -224,18 +223,18 @@ class HadoopRDD[K, V]( val key: K = reader.createKey()

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60528666 Hey @sryza this looks good - I tested it locally and it worked. I stumbled a bit with the test because I was using coalesce() and these metrics don't work well with coal

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r19382961 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -224,18 +223,18 @@ class HadoopRDD[K, V]( val key: K = reader.createKey()

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60458950 Ah sorry about that - I'm out until tomorrow morning but I can look then. I just wanted to test this locally with a few hadoop versions to check it, this looks good. In

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-24 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60454724 Anything else needed here? Sorry to keep pestering - I have an output metrics patch that depends on this that I'm eager to post. --- If your project is set up for it, you

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60198189 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60198186 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22061/consoleFull) for PR 2087 at commit [`23010b8`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60193928 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22061/consoleFull) for PR 2087 at commit [`23010b8`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60193683 Oops, sorry about that. Posted a new patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60193547 Just had some minor style comments - there were four cases which used the confusing invocation style but you only changed one of them. --- If your project is set up for

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r19259861 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -147,12 +150,37 @@ class NewHadoopRDD[K, V]( throw new java.util.N

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r19259865 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -147,12 +150,37 @@ class NewHadoopRDD[K, V]( throw new java.util.N

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r19259854 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -244,12 +243,35 @@ class HadoopRDD[K, V]( case eof: EOFException =>

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-60193215 @pwendell any further comments on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59970613 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21995/consoleFull) for PR 2087 at commit [`74fc9bb`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59970626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59959931 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21995/consoleFull) for PR 2087 at commit [`74fc9bb`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59959228 Small change to make a method I added private --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59892045 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21976/consoleFull) for PR 2087 at commit [`1ab662d`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59892050 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59891774 Jenkins, retest this pleas.e --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59891129 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21971/consoleFull)** for PR 2087 at commit [`1ab662d`](https://github.com/apac

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59891135 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59886278 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21976/consoleFull) for PR 2087 at commit [`1ab662d`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59885707 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59885695 Cool, updated patch addresses comments. It look like the failure is caused by a failure to fetch from git. --- If your project is set up for it, you can reply to this ema

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59882564 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59882086 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21971/consoleFull) for PR 2087 at commit [`1ab662d`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-20 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r19113109 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGroupI

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18939723 --- Diff: core/src/test/scala/org/apache/spark/metrics/InputMetricsSuite.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59318880 Yeah you are totally right - the performance bit was not correct from my end. I added some more comments on this. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18939679 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -222,12 +221,33 @@ class HadoopRDD[K, V]( case eof: EOFException =>

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18939658 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGro

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18939560 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGro

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18939525 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGro

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18833578 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGroupI

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59057738 > I.e. the Hadoop RDD should look up the entire function for the computing thread at the beginning, then it can invoke that function within the hot loop only. Comm

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18832984 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGroupI

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18832748 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -222,12 +221,33 @@ class HadoopRDD[K, V]( case eof: EOFException =>

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18832502 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGroupI

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18831556 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGroupI

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-58988193 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/376/consoleFull) for PR 2087 at commit [`305ad9f`](https://github.com/

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-58984415 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/376/consoleFull) for PR 2087 at commit [`305ad9f`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-58979294 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/367/consoleFull) for PR 2087 at commit [`305ad9f`](https://github.com/

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-58975673 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/367/consoleFull) for PR 2087 at commit [`305ad9f`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18797454 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGro

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-58960161 Hey Sandy, had a couple questions about behavior and assumptions from Hadoop. A couple of things here. The current approach does a lot of reflection every time we invoke

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18796711 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -222,12 +221,33 @@ class HadoopRDD[K, V]( case eof: EOFException =>

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18796643 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGro

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18796436 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -222,12 +221,33 @@ class HadoopRDD[K, V]( case eof: EOFException =>

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18796228 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGro

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18795775 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGro

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18795734 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -121,6 +125,31 @@ class SparkHadoopUtil extends Logging { UserGro

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-13 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2087#discussion_r18795643 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -147,12 +150,36 @@ class NewHadoopRDD[K, V]( throw new java.util.N

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57425658 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57425652 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21091/consoleFull) for PR 2087 at commit [`305ad9f`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57421546 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21091/consoleFull) for PR 2087 at commit [`305ad9f`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57421448 Updated patch switches from the pull to push model as requested by @pwendell and adds a test. I verified that the test succeeds against both Hadoop 2.2 and Hadoop 2.5 (whi

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57260156 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21013/consoleFull) for PR 2087 at commit [`a5486af`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-29 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57244182 Yeah so I just prefer keeping the TaskMetrics/InputMetrics as simple as possible rather than having callback registration and other state in them. The simplest possible

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-29 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57236118 > The current approach couples the updating of this metric with the heartbeats in a way that seems strange. The heartbeats (and task completion, which, my bad, I ne

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-29 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57229133 Hey @sryza so it seems like there are two things going on here. One is adding incremental update and the other is changing the way we deal with tracking read bytes for H

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-56461580 MapReduce doesn't use getPos, but it does look like it might be helpful in some situations. One caveat is that pos only means # bytes for file input formats. For example,

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-21 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-56307027 @aarondav @sryza Did you consider using reader.getPos() to get the correct metrics for older versions of Hadoop (as in here: https://github.com/kayousterhout/spark-

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-55358462 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20199/consoleFull) for PR 2087 at commit [`8bfaa24`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-55355492 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20199/consoleFull) for PR 2087 at commit [`8bfaa24`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-55355339 Updated patch includes fallback to the split size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54915960 I think we need some indication of the bytes being read from Hadoop. If this is our only current mechanism, then I think removing the code is not worth the behavioral re

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54867619 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19983/consoleFull) for PR 2087 at commit [`0034292`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54859158 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19983/consoleFull) for PR 2087 at commit [`0034292`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54852759 Just to make sure it's clear, the issue isn't only that we can be a few bytes off when we're reading outside of split boundaries, but that it'll look like we read the full

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54851789 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this f

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54807921 FWIW, I think mostly-accurate metrics are much better than no metrics in this case. The read/write bytes are very useful from Hadoop FSes, and Hadoop <2.5 is still very

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54780033 It looks like all the core tests are passing, but there are some failures in streaming and SQL tests. Have those been showing up elsewhere? --- If your project is set up

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54698426 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19865/consoleFull) for PR 2087 at commit [`0034292`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54695093 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19865/consoleFull) for PR 2087 at commit [`0034292`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-04 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54563520 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-04 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54563140 Hm, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-04 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54521510 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19793/consoleFull) for PR 2087 at commit [`0034292`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-04 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54519238 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19790/consoleFull) for PR 2087 at commit [`0a743c0`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-08-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-53005751 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19069/consoleFull) for PR 2087 at commit [`32daf1f`](https://github.com/a

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-08-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-52998163 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19069/consoleFull) for PR 2087 at commit [`32daf1f`](https://github.com/ap

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-08-21 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2087 SPARK-2621. Update task InputMetrics incrementally The patch takes advantage an API provided in Hadoop 2.5 that allows getting accurate data on Hadoop FileSystem bytes read. It eliminates the old met