Re: Dynamic variables in Spark

2014-07-23 Thread Neil Ferguson
Hi Patrick. That looks very useful. The thing that seems to be missing from Shivaram's example is the ability to access TaskMetrics statically (this is the same problem that I am trying to solve with dynamic variables). You mention defining an accumulator on the RDD. Perhaps I am

Re: Pull requests will be automatically linked to JIRA when submitted

2014-07-23 Thread Nicholas Chammas
By the way, it looks like there’s a JIRA plugin that integrates it with GitHub: - https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin - https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA

Announcing Spark 0.9.2

2014-07-23 Thread Xiangrui Meng
I'm happy to announce the availability of Spark 0.9.2! Spark 0.9.2 is a maintenance release with bug fixes across several areas of Spark, including Spark Core, PySpark, MLlib, Streaming, and GraphX. We recommend all 0.9.x users to upgrade to this stable release. Contributions to this release came

Re: Configuring Spark Memory

2014-07-23 Thread Nishkam Ravi
See if this helps: https://github.com/nishkamravi2/SparkAutoConfig/ It's a very simple tool for auto-configuring default parameters in Spark. Takes as input high-level parameters (like number of nodes, cores per node, memory per node, etc) and spits out default configuration, user advice and