That's roughly what I was thinking as well.
Time slices:
- 1 minute (smallest unit - should catch the peaks and valleys of batch
jobs that send to Kafka like crons)
- 1 hour (a normal index size)
- 4 hours (half of a typical SOC shift)
- 24 hours (1 day)
- 168 hours (1 week)
Stats for 1min:
- Num
Per the Apache Way it would be desirable to put forth an architecture proposal
together to have the community take a look at it before implementing. I would
propose to have a simple storm topology that attaches to a kafka topic and
records statistics such as # of messages and total throughput f
I can definitely give it a shot. A kickstart would be appreciated.
Jom
On Tue, Jul 12, 2016, 17:17 James Sirota wrote:
> John,
>
> Just field METRON-318. Is this something you would like to work on?
> Would you like help from us to get started?
>
> Thanks,
> James
>
> 12.07.2016, 11:53, "zeo.
John,
Just field METRON-318. Is this something you would like to work on? Would you
like help from us to get started?
Thanks,
James
12.07.2016, 11:53, "zeo...@gmail.com" :
> Hi All,
>
> Has there been any additional discussion or development regarding this? I
> did take a brief look around t
Hi All,
Has there been any additional discussion or development regarding this? I
did take a brief look around the jira and didn't see anything regarding
this, but I may have missed it. Thanks,
Jon
On Fri, Apr 15, 2016 at 2:01 PM Nick Allen wrote:
> I definitely agree that you need this leve
I definitely agree that you need this level of understanding of your
cluster. It definitely could work the way that you describe.
I was thinking of it slightly differently though. The metrics for this
purpose (understanding performance of existing cluster) should come from
the actual sensors the
However, it would be handy to have something like this perpetually running
so you know when to scale up/out/down/in a cluster.
On Fri, Apr 15, 2016, 13:35 Nick Allen wrote:
> I think it is slightly different. I don't even want to install minimal
> Kafka infrastructure (Look ma, no Kafka!)
>
> T
I think it is slightly different. I don't even want to install minimal
Kafka infrastructure (Look ma, no Kafka!)
The exact implementation would differ based on the data inputs that you are
trying to measure, but for example...
- To understand raw packet rates I would have a specialized sensor
So this is exactly what I am proposing. Calculate the metrics on the fly
without landing any data in the cluster. The problem is that that enterprise
data volumes are so large you can’t just point them at a Java or a C++ program
or sensor. You either need an existing minimal Kafka infrastruct
Or we have the assessment tool not actually land any data. The assessment
tool becomes a 'sensor' in its own right. You just point the input data
sets at the assessment tool, it builds metrics on the input (for example:
count the number of packets per second) and then we use those metrics to
esti
That’s an excellent point. So I think there are three ways forward.
One is we can assume that there has to be at least a minimal infrastructure in
place (at least a subset of Kafka and Storm resources) to run a full-scale
assessment. If you point something that blasts millions of messages pe
If the tool starts at Kafka, the user would have to already have committed
to the investment in the infrastructure and time to setup the sensors that
feed Kafka and Kafka itself. Maybe it would need to be further upstream?
On Apr 13, 2016 1:05 PM, "James Sirota" wrote:
> Hi Goerge,
>
> This arti
Makes sense.
--
George VetticadenPrincipal, Senior Product Manager for Metron
gvettica...@hortonworks.com
(630) 909-9138
On 4/13/16, 12:05 PM, "James Sirota" wrote:
>Hi Goerge,
>
>This article defines micro-tuning of the existing cluster. What I am
>proposing is a level up from that. Wh
Hi Goerge,
This article defines micro-tuning of the existing cluster. What I am proposing
is a level up from that. When you start with Metron how do you even know how
many nodes you need? And of these nodes how many do you allocate to Storm,
indexing, storage? How much storage do you need?
I have used the following Kafka and Storm Best Practices guide at numerous
customer implementations.
https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka-b
est-practices-guide.html
We need to have something similar and prescriptive for Metron based on:
1. What data sources ar
Hi George,
So the idea here is for the tool to gather the metrics and then either have
documentation or some kind of script that crunches through the metrics and
produces a configuration recommendation. So what you mention would be the
outcome of this analysis.
So an example would be if your
+ 1 to James suggestion.
We also need to consider not just the data volume and storage requirements
for proper cluster sizing but also processing requirements as well. Given
that in the new architecture, we have moved to single enrichment topology
that will support all data sources, proper sizing o
Prior to adoption of Metron each adopting entity needs to guesstimate it’s data
volume and data storage requirements so they can size their cluster properly.
I propose a creation of an assessment tool that can plug in to a Kafka topic
for a given telemetry and over time produce statistics for i
18 matches
Mail list logo