Re: "Dynamic variables" in Spark

2014-08-15 Thread Neil Ferguson
: >> > >> > TaskMetrics.get("f1-time") >> > >> > However, I don't think this would be possible with the named >> accumulators >> > -- I believe they'd need to be passed to every function that needs >> them, >> &g

Re: "Dynamic variables" in Spark

2014-07-24 Thread Neil Ferguson
>> TaskMetrics.get("f1-time") >> >> However, I don't think this would be possible with the named accumulators >> -- I believe they'd need to be passed to every function that needs them, >> which I think would be cumbersome in any application of reas

Re: "Dynamic variables" in Spark

2014-07-24 Thread Patrick Wendell
believe they'd need to be passed to every function that needs them, > which I think would be cumbersome in any application of reasonable > complexity. > > This is what I was trying to solve with my proposal for dynamic variables > in Spark. However, the ability to retrieve named

Re: "Dynamic variables" in Spark

2014-07-24 Thread Neil Ferguson
e") However, I don't think this would be possible with the named accumulators -- I believe they'd need to be passed to every function that needs them, which I think would be cumbersome in any application of reasonable complexity. This is what I was trying to solve with my proposal

Re: "Dynamic variables" in Spark

2014-07-23 Thread Neil Ferguson
Hi Patrick. That looks very useful. The thing that seems to be missing from Shivaram's example is the ability to access TaskMetrics statically (this is the same problem that I am trying to solve with dynamic variables). You mention defining an accumulator on the RDD. Perhaps I am missin

Re: "Dynamic variables" in Spark

2014-07-22 Thread Patrick Wendell
Shivaram, You should take a look at this patch which adds support for naming accumulators - this is likely to get merged in soon. I actually started this patch by supporting named TaskMetrics similar to what you have there, but then I realized there is too much semantic overlap with accumulators,

Re: "Dynamic variables" in Spark

2014-07-22 Thread Neil Ferguson
Hi Christopher Thanks for your reply. I'll try and address your points -- please let me know if I missed anything. Regarding clarifying the problem statement, let me try and do that with a real-world example. I have a method that I want to measure the performance of, which has the following signa

Re: "Dynamic variables" in Spark

2014-07-22 Thread Shivaram Venkataraman
>From reading Neil's first e-mail, I think the motivation is to get some metrics in ADAM ? -- I've run into a similar use-case with having user-defined metrics in long-running tasks and I think a nice way to solve this would be to have user-defined TaskMetrics. To state my problem more clearly, l

Re: "Dynamic variables" in Spark

2014-07-22 Thread Neil Ferguson
Hi Reynold Thanks for your reply. Accumulators are, of course, stored in the Accumulators object as thread-local variables. However, the Accumulators object isn't public, so when a Task is executing there's no way to get the set of accumulators for the current thread -- accumulators still have to

Re: "Dynamic variables" in Spark

2014-07-21 Thread Reynold Xin
Thanks for the thoughtful email, Neil and Christopher. If I understand this correctly, it seems like the dynamic variable is just a variant of the accumulator (a static one since it is a global object). Accumulators are already implemented using thread-local variables under the hood. Am I misunder

Re: "Dynamic variables" in Spark

2014-07-21 Thread Christopher Nguyen
Hi Neil, first off, I'm generally a sympathetic advocate for making changes to Spark internals to make it easier/better/faster/more awesome. In this case, I'm (a) not clear about what you're trying to accomplish, and (b) a bit worried about the proposed solution. On (a): it is stated that you wan

"Dynamic variables" in Spark

2014-07-21 Thread Neil Ferguson
Hi all I have been adding some metrics to the ADAM project https://github.com/bigdatagenomics/adam, which runs on Spark, and have a proposal for an enhancement to Spark that would make this work cleaner and easier. I need to pass some Accumulators around, which will aggregate metrics (timing stat