[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10846 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63402/ Test PASSed. ---

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10846 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10846 **[Test build #63402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63402/consoleFull)** for PR 10846 at commit

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/10846 @rajeshbalamohan can you also update the PR title and summary? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10846 **[Test build #63402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63402/consoleFull)** for PR 10846 at commit

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/10846 They take longer to clean up. If queries are executed continuously, major portion of thrift server wastes time in GC-ing. IAC, I have removed the HadoopRDD in the recent commit and

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/10846 I'm not saying we should fix just one of them. I'm saying we should treat them as separate issues. I am a little concerned about the workaround for the soft refs, for example, and that doesn't need

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/10846 SoftRef causes lots of mem-pressure on thrift server. To be precise, when executing query with large dataset, it can very soon run at 1200% CPU and all threads carrying out just GC

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/10846 @rajeshbalamohan could you break this into two separate bugs? The JobProgressListener issue is clear and the fix looks fine. But the cache issue is less clear - it would be better to understand why

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/10846 So, `SparkEnv.hadoopJobMetadata` actually keeps soft refs to the conf objects, so they eventually should be garbage collected when the `HadoopRDD` instances go away. So isn't your problem just a

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/10846 Ok, so this is actually two bugs: - more forcefully respecting the "retainedStages" config; now the code might actually stop showing active stages on the web ui, right? Not sure how big of

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10846 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63367/ Test PASSed. ---

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10846 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10846 **[Test build #63367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63367/consoleFull)** for PR 10846 at commit

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10846 **[Test build #63367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63367/consoleFull)** for PR 10846 at commit

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-08 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/10846 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #10846: [SPARK-12920][SQL] Fix high CPU usage in spark thrift se...

2016-08-07 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/10846 - Rebased to master and changed title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this