In these scenarios it's fairly standard to report the metrics either
directly or through accumulators (
http://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka)
to a time series database such as Graphite (http://graphite.wikidot.com/)
or OpenTSDB (http://opentsdb.ne
Hi,
My limited understanding of Spark tells me that a task is the least
possible working unit and Spark itself won't give you much. It
wouldn't expect so since "acount" is a business entity not Spark's
one.
What about using mapPartitions* to know the details of partitions and
do whatever you want
Hi all,
I got a simple processing job for 2 accounts on 8 partitions. It's
roughly 2500 accounts on each partition. Each account will take about 1s to
complete the computation. That means each partition will take about 2500
seconds to finish the batch.
My question is how can I get the detaile