Thank you, Risto, such overview can serve as starting point, when designing something yet more useful, than static configurations of correlations.
I am still not sure, what the optimal goal should be, I need to analyze and study more about these topics. I believe, that I will follow up on information you provided, in the future, and further develop this thread. št 23. 1. 2020 o 18:12 Risto Vaarandi <risto.vaara...@gmail.com> napísal(a): > hi Richard, > > Next step would be integrating AI (machine learning) with SEC somehow, so >> that user won't need to configure correlations statically, but they would >> configure and self-optimize automatically. (There still could be some input >> needed from the user, but system would be also able to react on changing >> log traffic, and self-evolve.) >> >> Something like ELK+AI has usable in the log monitoring area. >> >> Maybe some integration with MXNet? >> >> http://blogs.perl.org/users/sergey_kolychev/2017/02/machine-learning-in-perl.html >> >> Does anybody have any experience in this area, to explain some more or >> less theoretical or practical setup of AI-generated SEC rules? (I am pretty >> sure, that this is out of scope of SEC itself, and SEC would'nt know, that >> AI is dynamically generating its rules on the background and probably >> nobody has working solution, but maybe we could invent something together.) >> >> > Machine learning is a very wide area with a large number of different > methods and algorithms around. These methods and algorithms are usually > divided into two large classes: > *) supervised algorithms which assume that you provide labeled data for > learning (for example, a log file where some messages are labeled as > "normal" and some messages as "system_fault"), so that the algorithm can > learn from labeled examples how to distinguish normal messages from errors > (note that in this simplified example, only two labels were used, but in > more complex cases you could have more labels in play) > *) unsupervised algorithms which are able to distinguish anomalous or > abnormal messages without any previous training with labeled data > So my first question is -- what is your actual setup and do you have the > opportunity of using training data for supervised methods, or are > unsupervised methods a better choice? After answering this question, you > can start studying most promising methods more closely. > > Secondly, what is your actual goal? Do you want to: > 1) detect an individual anomalous message or a time frame containing > anomalous messages from event logs, > 2) produce a warning if the number of messages from specific class (e.g. > login failures) per N minutes increases suddenly to an unexpectedly large > value, > 3) use some tool for (semi)automated mining of new SEC rules, > 4) something else? > > For achieving first goal, there is no silver bullet, but perhaps I can > provide few pointers to some relevant research papers (note that there are > many other papers in this area): > https://ieeexplore.ieee.org/document/4781208 > https://ieeexplore.ieee.org/document/7367332 > https://dl.acm.org/doi/10.1145/3133956.3134015 > > For achieving the second goal, you could consider using time series > analysis methods. You could begin with a very simple moving average based > method like the one described here: > > https://machinelearnings.co/data-science-tricks-simple-anomaly-detection-for-metrics-with-a-weekly-pattern-2e236970d77 > or you could employ more complex forecasting methods (before starting, it > is probably a good idea to read this book on forecasting: > https://otexts.com/fpp2/) > > If you want to mine new rules or knowledge for SEC (or for other tools) > from event logs, I have actually done some previous research in this > domain. Perhaps I can point you to a log mining utility called LogCluster ( > https://ristov.github.io/logcluster/) which allows for mining line > patterns and outliers from textual events logs. Also, couple of years ago, > an experimental system was created which was using LogCluster in a fully > automated way for creating SEC Suppress rules, where these rules were > essentially matching normal (expected) messages. Any message not matching > these rules was considered an anomaly and was logged separately for manual > review. Here is the paper that provides an overview of this system: > https://ristov.github.io/publications/noms18-log-anomaly-web.pdf > > Hopefully these pointers will offer you some guidance what your precise > research question could be, and what is the most promising avenue for > continuing. My apologies if my answer was raising new questions, but > machine learning is a very wide area with large number of methods for many > different goals. > > kind regards, > risto > >
_______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users