[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-16 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 FYI, finally, I figured out the root cause: https://github.com/netty/netty/issues/5833 As far as I understand, `System.setProperty("io.netty.maxDirectMemory", "0");` should be a correct

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-15 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 Agreed. I'm going to merge to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-15 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 OK, I think this is a good change. Maybe to be conservative we'll only put this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-14 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 > @zsxwing you seem to understand this better, but is it that the default behavior changes and is probably a bad default now, or just that it's inappropriate for Spark? I don't have a

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-14 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 > For future reference here is the context of how that option is used:

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14961 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14961 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65361/ Test PASSed. ---

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #65361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65361/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-14 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 Nice research here. So that's probably the only real way to set this property? it has to be a system property I guess and this should fire before the classes in questions init as far as I can see.

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-14 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 Thanks @zsxwing, I've removed our older experiments in favour of this one For future reference here is the context of how that option is used:

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #65361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65361/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 Confirmed the issue was introduced by https://github.com/netty/netty/commit/d58dec8862e02fc2a98f8dcdb166db4b788be50a#diff-8d83d75ebf8a18cc48bf0a0b1183c188 Add

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 Hm, I can reproduce the same error using this command `build/sbt "project core" "test-only *Shuffle*"` locally. The first broken version is 4.0.37.Final. --- If your project is set up for it, you

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 Oh, the allocator is set here: https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java#L95 --- If your project is set

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 @a-roberts could you binary search the first broken netty version? Since this cannot be reproduced locally, you have to push new commits. --- If your project is set up for it, you can reply to

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 Still saw the following errors in the unit-test log: ``` 16/09/13 07:41:18.817 shuffle-server-466-7 WARN TransportChannelHandler: Exception in connection from /127.0.0.1:36871

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14961 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65322/ Test FAILed. ---

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #65322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65322/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14961 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #65322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65322/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 Had a look to see how to do this https://github.com/netty/netty/blob/a01519e4f86690323647b5db45d9ffcb184b1a84/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java so I'll add

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 Yep that makes more sense, UnpooledByteBufAllocator usage coming up --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 OK, so same failure with this change. Hm. I don't think it's that something is just slow but that the error in https://github.com/apache/spark/pull/14961#issuecomment-245090209 causes netty to

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 [info] - using external shuffle service *** FAILED *** (1 minute) [info] java.util.concurrent.TimeoutException: Can't find 2 executors before 6 milliseconds elapsed 60 seconds

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #3256 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3256/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #3256 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3256/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14961 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14961 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65264/ Test FAILed. ---

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #65264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65264/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #65264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65264/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 No new test failures with my runs ranging from Hadoop 2.3 to Hadoop 2.7 today so pushed the commit above --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 Sean, yep, I've had trouble reproducing it too, kicked off a bunch of builds over the weekend including one using Hadoop-2.3 which was my initial theory (only difference between our testing

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 @a-roberts are you in a position to add this change to this PR as an experiment? I can try it on the side too. I can't seem to reproduce the failure locally, even when fully rebuilding the project

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-07 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 In addition, I think we should figure out why upgrading the netty version will fail. The issue about Recycler seems also in `4.0.29.Final`. Is it because netty starts to track the memoryprint since

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-07 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 > I suppose one hacky way to test the theory above is to push a commit here that sets this in NettyUtils: Let's add it in `TransportConf` so that it's easy to find since it's the place of

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-07 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 I'm not familiar with netty's Recycler. But the default value of `io.netty.recycler.maxCapacity` is 262144. This seems too big for Spark anyway. I don't think we need to cache 260k objects. ---

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-07 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 Hm, I can't get this test to fail with Netty 4.0.41 when I 'mvn install' and run the test suite locally. I'm having a hard time seeing what could alleviate the failure. I suspect that this

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 I think we can binary search the first broken netty version. It would be easy to find out the real issue. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 > Is the lesson here to not bother with pooling and use the UnpooledByteBufAllocator? Not sure. Pooling is for improving the performance because allocating direct buffers is pretty slow.

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 Aha, possibly this: https://groups.google.com/forum/#!topic/netty/3BoF7q34Z4I Is the lesson here to not bother with pooling and use the UnpooledByteBufAllocator? --- If your project is

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14961 I saw the error in the log: ``` 16/09/05 08:21:56.758 shuffle-server-593-8 WARN TransportChannelHandler: Exception in connection from /127.0.0.1:44788

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 Hm, no I take it back, it's a consistent failure that doesn't show up in the main test builds (for any Hadoop version): ``` [info] - using external shuffle service *** FAILED *** (1

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #3249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3249/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 Thanks, I did a ctrl-f for "** fail", you'd have a better idea of what the known flakies are in this farm though, my quick checking: - using external shuffle service -> looks to be a

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 Hm, I see just one in the PR builder here, really. And it's different from run to run so this could well be spurious. Re-running tests one more time here. --- If your project is set up for it, you

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #3249 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3249/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 In the description I mentioned that for testing I used "Existing unit tests against branch-1.6 and branch-2.0 using IBM Java 8 on Intel, Power and Z architectures", so clarifying that I only used

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 Are you saying thousands of tests fail with certain Hadoop versions and this version change? That's hard to believe. I'd be very surprised if this caused a test failure. However I do see this PR

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 Thanks, so are we saying netty 4.0.29 can't be upgraded to 4.0.41 without breaking changes? That's not even a minor version change... On branch 1.6 with the netty change for myself I see

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 @jerryshao that's a good point though in theory a maintenance release contains no API or behavior changes (that aren't bugs). Let's perhaps not touch 1.6 then to be conservative. Hadoop uses a

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14961 Also many other downstream and upstream applications may also use different version of Netty jar, it would be better to keep stable for these fundamental dependences. --- If your project is set

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14961 Upgrading Netty version to branch 1.6 may cause API version incompatible issue for yarn shuffle service, please see [SPARK-16018](https://issues.apache.org/jira/browse/SPARK-16018) and

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #3247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3247/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #3247 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3247/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #3246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3246/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #3246 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3246/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14961 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14961 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64938/ Test FAILed. ---

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #64938 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64938/consoleFull)** for PR 14961 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 Looks good for master to 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14961 **[Test build #64938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64938/consoleFull)** for PR 14961 at commit