[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-23 Thread ilganeli
Github user ilganeli closed the pull request at: https://github.com/apache/spark/pull/3518 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-23 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-71299062 Hey @pwendell - not a problem. The solutions are similar but Reynold's has fewer moving parts. I appreciate the recognition. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70399407 BTW - my apologies for marking this as a starter task, it turned out to be more complicated. We can credit you for having worked on the feature as well. --- If your

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70394250 Hey @ilganeli - I took a slightly deeper look this time. I still don't totally follow how this all hooks together, but I wonder if it's possible to write a single

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70399358 Hey so it looks like while I was reviewing this patch @rxin actually ran into this and just wrote a fix himself (#4093). That fix is actually even simpler than what I

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23050901 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -827,9 +868,21 @@ class DAGScheduler( // might modify state of

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23051146 --- Diff: core/src/main/scala/org/apache/spark/util/ObjectWalker.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70180151 Hi Patrick - thanks for the feedback. I would love to print out the names of the fields but I wasn't able to figure out a way to do that - could you suggest how?

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23050776 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -789,6 +792,44 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70179454 Hey just took a quick pass with some code style suggestions (more coming) and usability suggestions. One thing, would it be possible to track the name of the fields you

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23051221 --- Diff: core/src/main/scala/org/apache/spark/util/ObjectWalker.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23051282 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -459,7 +459,23 @@ private[spark] class TaskSetManager( }

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23051021 --- Diff: core/src/main/scala/org/apache/spark/util/SerializationHelper.scala --- @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread ilganeli
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23052373 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -789,6 +792,44 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70188638 [Test build #25620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25620/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70188734 [Test build #25620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25620/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70188737 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70192878 [Test build #25624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25624/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70198323 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70198318 [Test build #25624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25624/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-69591770 [Test build #25420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25420/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-12 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-69591920 Hi @JoshRosen, #3638 has now been merged and I've resolved the minor merge conflicts and pushed the updates. If you could please review this at your convenience, I'd

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-69604747 [Test build #25420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25420/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-69604755 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-29 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-68312213 Hi @JoshRosen - just checking in to make sure things are moving on #3638 since it's a blocker to this patch. Please let me know how that's going, looks to be almost

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67455654 Hi @JoshRosen, I have made the updates we've discussed. Example output is shown below for two cases of unserializable RDDs. The first is when an individual RDD is

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67455620 [Test build #24577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24577/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67464102 [Test build #24577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24577/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67464113 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67520645 This looks pretty neat; I'll try to review this soon (a little busy right now), but in the meantime you might be interested in #3638 which has some small overlap in

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67523669 Great - thanks, Josh. I'm working on doing a bit more code cleanup in the mean-time to minimize touch points within the existing Spark classes. --- If your project is

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67548390 [Test build #24597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24597/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67549171 [Test build #24598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24598/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67559915 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67559909 [Test build #24597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24597/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67560956 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67560952 [Test build #24598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24598/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-15 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-67070557 Hi @JoshRosen , I think I've finally understood what you've been saying from the beginning (apologies for being slow). I haven't been thinking correctly about what is

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-12 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66868725 IMO, this patch still needs a lot of work before it will be ready to merge. I'm not convinced that telling me which RDD referenced the unserializable object, by

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-12 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21786744 --- Diff: docs/configuration.md --- @@ -517,6 +517,14 @@ Apart from these, the following properties are also available, and may be useful /td

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-12 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21786752 --- Diff: core/src/main/scala/org/apache/spark/util/SerializationHelper.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-12 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21786754 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -873,6 +943,10 @@ class DAGScheduler( // We've already

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-12 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21786759 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -458,6 +458,17 @@ private[spark] class TaskSetManager(

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-11 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66658459 Hi @JoshRosen - with the updates I've made is this ok to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-08 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66144913 Hi @JoshRosen - can I please get this run through Jenkins? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66145968 [Test build #24224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24224/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66145560 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66158875 [Test build #24224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24224/consoleFull) for PR 3518 at commit

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66158890 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-02 Thread ilganeli
Github user ilganeli closed the pull request at: https://github.com/apache/spark/pull/3518 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-02 Thread ilganeli
GitHub user ilganeli reopened a pull request: https://github.com/apache/spark/pull/3518 [SPARK-3694] RDD and Task serialization debugging output Hi all - in addition to what was explicitly requested in the original JIRA, I also added the ability to have a trace of the serialization

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-01 Thread ilganeli
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21102635 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -458,6 +458,20 @@ private[spark] class TaskSetManager(

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread ilganeli
GitHub user ilganeli opened a pull request: https://github.com/apache/spark/pull/3518 [SPARK-3694] RDD and Task serialization debugging output Hi all - in addition to what was explicitly requested in the original JIRA, I also added the ability to have a trace of the serialization

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-64994713 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-64995074 I'm going to let Jenkins test this, but my hunch is that the first run is going to fail due to Scalastyle warnings / errors. I'll comment on a couple of these style

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062370 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -788,6 +793,63 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062380 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -788,6 +793,63 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062377 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -788,6 +793,63 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062385 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -788,6 +793,63 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062393 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -788,6 +793,63 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062401 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -788,6 +793,63 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062406 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -788,6 +793,63 @@ class DAGScheduler( } } +

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062416 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -826,9 +888,23 @@ class DAGScheduler( // might modify state of

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062420 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -826,9 +888,23 @@ class DAGScheduler( // might modify state of

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062428 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -826,9 +888,23 @@ class DAGScheduler( // might modify state of

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062433 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -458,6 +458,20 @@ private[spark] class TaskSetManager(

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062448 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -458,6 +458,20 @@ private[spark] class TaskSetManager(

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062454 --- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062466 --- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062458 --- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062475 --- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062491 --- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062519 --- Diff: core/src/main/scala/org/apache/spark/util/SerializationHelper.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062522 --- Diff: core/src/main/scala/org/apache/spark/util/SerializationHelper.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062530 --- Diff: core/src/main/scala/org/apache/spark/util/SerializationHelper.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062535 --- Diff: core/src/main/scala/org/apache/spark/util/SerializationHelper.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062539 --- Diff: core/src/main/scala/org/apache/spark/util/SerializationHelper.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r21062580 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -244,6 +244,133 @@ class DAGSchedulerSuite extends

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-64996599 Formatting nits aside, a couple of higher-level comments: If I understand this patch correctly, it only tells me which RDD contains the non-serializable object

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-65001161 Thanks for the quick review Josh, I'll look into refactoring the search step into a separate component. With regards to your first comment, I display two types of

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-65001985 Thus, the serialization trace tells us which specific RDD is unserializable and this secondary output shows how it's related to the parent RDD. My concern

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-11-30 Thread ilganeli
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-65008009 If the goal is to trace the object graph while it's being serialized would that require modification of the serializer itself? How else would we get into the guts of