Re: Need suggestions on monitor Spark progress

2015-11-30 Thread Alex Rovner
In these scenarios it's fairly standard to report the metrics either directly or through accumulators ( http://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka) to a time series database such as Graphite (http://graphite.wikidot.com/) or OpenTSDB (http://opentsdb.ne

Re: Need suggestions on monitor Spark progress

2015-11-30 Thread Jacek Laskowski
Hi, My limited understanding of Spark tells me that a task is the least possible working unit and Spark itself won't give you much. It wouldn't expect so since "acount" is a business entity not Spark's one. What about using mapPartitions* to know the details of partitions and do whatever you want

Need suggestions on monitor Spark progress

2015-11-29 Thread Yuhao Yang
Hi all, I got a simple processing job for 2 accounts on 8 partitions. It's roughly 2500 accounts on each partition. Each account will take about 1s to complete the computation. That means each partition will take about 2500 seconds to finish the batch. My question is how can I get the detaile