Re: [DISCUSS] Metron assessment tool

2016-07-15 Thread zeo...@gmail.com
That's roughly what I was thinking as well. Time slices: - 1 minute (smallest unit - should catch the peaks and valleys of batch jobs that send to Kafka like crons) - 1 hour (a normal index size) - 4 hours (half of a typical SOC shift) - 24 hours (1 day) - 168 hours (1 week) Stats for 1min: - Num

Re: [DISCUSS] Metron assessment tool

2016-07-12 Thread James Sirota
Per the Apache Way it would be desirable to put forth an architecture proposal together to have the community take a look at it before implementing. I would propose to have a simple storm topology that attaches to a kafka topic and records statistics such as # of messages and total throughput f

Re: [DISCUSS] Metron assessment tool

2016-07-12 Thread zeo...@gmail.com
I can definitely give it a shot. A kickstart would be appreciated. Jom On Tue, Jul 12, 2016, 17:17 James Sirota wrote: > John, > > Just field METRON-318. Is this something you would like to work on? > Would you like help from us to get started? > > Thanks, > James > > 12.07.2016, 11:53, "zeo.

Re: [DISCUSS] Metron assessment tool

2016-07-12 Thread James Sirota
John, Just field METRON-318. Is this something you would like to work on? Would you like help from us to get started? Thanks, James 12.07.2016, 11:53, "zeo...@gmail.com" : > Hi All, > > Has there been any additional discussion or development regarding this? I > did take a brief look around t

Re: [DISCUSS] Metron assessment tool

2016-07-12 Thread zeo...@gmail.com
Hi All, Has there been any additional discussion or development regarding this? I did take a brief look around the jira and didn't see anything regarding this, but I may have missed it. Thanks, Jon On Fri, Apr 15, 2016 at 2:01 PM Nick Allen wrote: > I definitely agree that you need this leve

Re: [DISCUSS] Metron assessment tool

2016-04-15 Thread Nick Allen
I definitely agree that you need this level of understanding of your cluster. It definitely could work the way that you describe. I was thinking of it slightly differently though. The metrics for this purpose (understanding performance of existing cluster) should come from the actual sensors the

Re: [DISCUSS] Metron assessment tool

2016-04-15 Thread zeo...@gmail.com
However, it would be handy to have something like this perpetually running so you know when to scale up/out/down/in a cluster. On Fri, Apr 15, 2016, 13:35 Nick Allen wrote: > I think it is slightly different. I don't even want to install minimal > Kafka infrastructure (Look ma, no Kafka!) > > T

Re: [DISCUSS] Metron assessment tool

2016-04-15 Thread Nick Allen
I think it is slightly different. I don't even want to install minimal Kafka infrastructure (Look ma, no Kafka!) The exact implementation would differ based on the data inputs that you are trying to measure, but for example... - To understand raw packet rates I would have a specialized sensor

Re: [DISCUSS] Metron assessment tool

2016-04-15 Thread James Sirota
So this is exactly what I am proposing. Calculate the metrics on the fly without landing any data in the cluster. The problem is that that enterprise data volumes are so large you can’t just point them at a Java or a C++ program or sensor. You either need an existing minimal Kafka infrastruct

Re: [DISCUSS] Metron assessment tool

2016-04-15 Thread Nick Allen
Or we have the assessment tool not actually land any data. The assessment tool becomes a 'sensor' in its own right. You just point the input data sets at the assessment tool, it builds metrics on the input (for example: count the number of packets per second) and then we use those metrics to esti

Re: [DISCUSS] Metron assessment tool

2016-04-13 Thread James Sirota
That’s an excellent point. So I think there are three ways forward. One is we can assume that there has to be at least a minimal infrastructure in place (at least a subset of Kafka and Storm resources) to run a full-scale assessment. If you point something that blasts millions of messages pe

Re: [DISCUSS] Metron assessment tool

2016-04-13 Thread Nick Allen
If the tool starts at Kafka, the user would have to already have committed to the investment in the infrastructure and time to setup the sensors that feed Kafka and Kafka itself. Maybe it would need to be further upstream? On Apr 13, 2016 1:05 PM, "James Sirota" wrote: > Hi Goerge, > > This arti

Re: [DISCUSS] Metron assessment tool

2016-04-13 Thread George Vetticaden
Makes sense. -- George VetticadenPrincipal, Senior Product Manager for Metron gvettica...@hortonworks.com (630) 909-9138 On 4/13/16, 12:05 PM, "James Sirota" wrote: >Hi Goerge, > >This article defines micro-tuning of the existing cluster. What I am >proposing is a level up from that. Wh

Re: [DISCUSS] Metron assessment tool

2016-04-13 Thread James Sirota
Hi Goerge, This article defines micro-tuning of the existing cluster. What I am proposing is a level up from that. When you start with Metron how do you even know how many nodes you need? And of these nodes how many do you allocate to Storm, indexing, storage? How much storage do you need?

Re: [DISCUSS] Metron assessment tool

2016-04-13 Thread George Vetticaden
I have used the following Kafka and Storm Best Practices guide at numerous customer implementations. https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka-b est-practices-guide.html We need to have something similar and prescriptive for Metron based on: 1. What data sources ar

Re: [DISCUSS] Metron assessment tool

2016-04-13 Thread James Sirota
Hi George, So the idea here is for the tool to gather the metrics and then either have documentation or some kind of script that crunches through the metrics and produces a configuration recommendation. So what you mention would be the outcome of this analysis. So an example would be if your

Re: [DISCUSS] Metron assessment tool

2016-04-13 Thread George Vetticaden
+ 1 to James suggestion. We also need to consider not just the data volume and storage requirements for proper cluster sizing but also processing requirements as well. Given that in the new architecture, we have moved to single enrichment topology that will support all data sources, proper sizing o

[DISCUSS] Metron assessment tool

2016-04-13 Thread James Sirota
Prior to adoption of Metron each adopting entity needs to guesstimate it’s data volume and data storage requirements so they can size their cluster properly. I propose a creation of an assessment tool that can plug in to a Kafka topic for a given telemetry and over time produce statistics for i