Check ITIL documentation for the proper definition of incident, problem, etc.
In regard to "what is important"... that is a harder question. It depends on your device mix, redundancy levels, etc. I would approach it in this manner: 1. Is all the noise out of the environment? Do I have flapping (i.e. up/down) interfaces? How many/how much? 2. Do I have a lot of duplicate-events (after defining what a duplicate is)? How many/how much? 3. Can I abstract any events out of a given number of like events (poor man's correlation)? How many/how much? 4. Are there any correlation scenarios that I can determine the patterns for? How many/how much? 5. Once you figure out the types of event-reduction, abstractions, and correlations you have, then determine the total number of events you can reduce for operations, assign a man-hour value per event (e.g. 10 minutes), and calculate the total man-hours you can save for reducing events in each category. This can be equated to a dollar value. Bottom-line... managers like to save money. Give them a real (or real enough) dollar-savings figure you can wave around. The trouble is... it'll take you going through event-logs looking for patterns on your own, but that shouldn't be that hard. After I showed success in those areas, then I'd tackle service/business correlation -- but not before. That's how I'd start (i.e. start with the thing managers understand -- saving money !!!) Gary Boyles, Intel -----Original Message----- From: Tim Peiffer [mailto:peif...@umn.edu] Sent: Monday, September 10, 2012 6:03 PM To: simple-evcorr-users@lists.sourceforge.net Subject: [Simple-evcorr-users] articulating the need for discussion of what is important. I am having some issues trying to get buy-in for event correlator operations. I want to engage systems owners and operators in such a way that they define what is important, what is work and what is an incident or a trouble ticket. I want to have them articulate how to combine log events to provide for a systems based business intelligence. I am a firm believer, but I need to articulate something to convince people that would otherwise not be involved, because they think this is the job for the network management platform. Down events followed by up events are OK by themselves, but multiple cycles or bounces indicate an issue that just isn't going away. Does anyone know of a document that discusses event correlation in general, and particularly in how to look at at logs to determine what is important, and how things group together to provide effective event handling and consolidation? So how does one articulate the need to define what is an important notice, what is work, and what defines an incident? -- Tim Peiffer Network Support Engineer Office of Information Technology University of Minnesota/NorthernLights GigaPOP +1 612 626-7884 (desk) ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users