Re: Extracting Performance Metrics

Ted Dunning Mon, 16 Jun 2014 18:26:25 -0700

If you can afford a bit more time for insertion, consider also t-digest.

Differences relative to the high dynamic range histogram system include:


- HDR histograms assume an exponential distribution.  t-digest handles
arbitrary distributions

- t-digest is much more accurate near extreme values.  You can have 0.1%
accuracy near the 50th percentile and have part per million accuracy (or
better) at the 99.999th %-ile

- HDR histograms are *much* faster for insertion.  The claimed speeds are <
10ns for HDF histograms and measured speeds for t-digest are more like
200-500 ns.

- t-digest can handle far more skew than HDR histograms.  A standard test
case is a gamma distribution which has 30 orders of magnitude of skew near
low quantiles.

See https://github.com/tdunning/t-digest for a low transitive dependency
version.  Apache Mahout also has a version (with larger dependency tree).

Let me know if I can help on this.



On Mon, Jun 16, 2014 at 5:57 PM, Dan <dcies...@hotmail.com> wrote:

> Be careful when using Coda Hale's Metrics package when measuring latency.
> Consider using Gil Tene's
> High Dynamic Range Histogram instead:
>
> http://hdrhistogram.github.io/HdrHistogram/
>
> -Dan
>
> ------------------------------
> From: and...@parsely.com
> Date: Mon, 16 Jun 2014 18:20:11 -0400
> Subject: Re: Extracting Performance Metrics
> To: user@storm.incubator.apache.org
>
> Also, I came across this presentation by Visible Measures which actually
> walks through a lot of great options covering most of what you want to know
> about:
>
> http://files.meetup.com/5809742/storm%20monitoring.pdf
>
> One other thing to be aware of is that in Storm 0.9.2 (forthcoming
> release), there is a new REST API used by the Storm UI for gathering some
> of these metrics:
>
> https://github.com/apache/incubator-storm/pull/101
> https://issues.apache.org/jira/browse/STORM-205
>
>
> On Mon, Jun 16, 2014 at 6:13 PM, Andrew Montalenti <and...@parsely.com>
> wrote:
>
> I haven't used it yet, but a lot of people get pointed to metrics_storm:
>
> https://github.com/ooyala/metrics_storm
>
> With this blog post that discusses it:
>
> http://engineering.ooyala.com/blog/open-sourcing-metrics-storm
>
> Michael Noll also has a nice blog post about streaming Storm 0.9 metrics
> to Graphite:
>
>
> http://www.michael-noll.com/blog/2013/11/06/sending-metrics-from-storm-to-graphite/
>
> Currently, when we use Storm, we do a lot of custom metrics in Graphite
> using statsd, as described in this post (not about Storm, but about
> Graphite/statsd):
>
> http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
>
>
>
>
> On Mon, Jun 16, 2014 at 4:37 PM, Anis Nasir <aadi.a...@gmail.com> wrote:
>
> Dear all,
>
> I am running a cluster with 1 kafka + 1 nimbus + 10 supervisor + 1
> zookeeper nodes. I am executing multiple topologies on the cluster and I
> want to extract different metrics that I am mentioning below. Can someone
> help me by recommending tools that I can use to extract this information.
>
>
> Per Topology
>      - Throughput
>      - Latency
>
> Per Spout or Bolt
>      - Throughput
>      - Latency
>      - Execution Time
>      - Queuing Time
>      - Number of Messages Processed
>
> Regards
> Anis
>
>
>
>

Re: Extracting Performance Metrics

Reply via email to