Hi all, A Markov Chain is a memoryless, random process which undergoes transitions from one state to another on a defined state space. Markov chains are quite useful in outlier detection mechanisms and we are currently in the process of implementing Outlier detection using Markov Chains in CEP. [1] Gives a basic understanding of the mathematical properties and proofs behind Markov Chains.
When using Markov Chains for outlier detection, we need to handle.. 1. State Classification This involves checking for some property(s) in the incoming event and classifying the event in to a specific state from the state space. This will be done by using siddhi filters. 2. Compute State Transition Probability Matrix Once the data is classified in to states, the next step is to calculate the probabilities of transitioning from one state (si) to another (sj) for all possible states in the state space, using a training dataset. This involves, creating and updating a probability matrix of n x n size for a state space of 'n' states. This will be implemented using siddhi event tables. Using a training dataset, we will calculate the probability of transition from si to sj as the number of transitions from si to sj divided by the total number of transitions. The event table will store the state transition matrix as a table which has n x n transitions and their corresponding probabilities. 3. Calculate Outlier Sequence Metrics Once the state transition probability matrix is established, we can use that to implement some (of many) possible metrics that are used for outlier detection. Some of the initial metrics we will implement are the following a. Miss Probability Metric b. Miss Rate Metric c. Entropy Reduction Metric Refer [2] for more details on how to calculate the above metrics. In general, these metrics are calculated by doing simple manipulations to the probabilities of the probability matrix. In CEP we achieve this by using simple math functions like sum and basic operators to compute the metric values of incoming sequences. 4. Compare incoming sequences with threshold values for outlier sequence metrics. Once the metric is computed, the last step is to define some threshold values for the outlier sequence metrics and compare them for each incoming sequence. This will be achieved in CEP by using siddhi filters. -------------------------------------------------------------------------------------- Once this is implemented, we can use this for outlier sequence detection in many usecases. Some of the most obvious ones are ... 1. Fraudulent Patterns in credit card transactions/ bank transactions/ online purchases etc; 2. Bot detection in websites - analyzing click sequences 3. Sequence mining in sensor events (weather patterns, vehicle movement, etc;) 4. Medical Diagnosis - (eg:- analyzing patterns of ECG timeseries) Following are some useful resources for background information on Markov chains and fraud detection 1. http://people.math.aau.dk/~jm/courses/PhD06StocSim/Chapters2.4.5.pdf 2. https://pkghosh.wordpress.com/2013/10/21/real-time-fraud-detection-with-sequence-mining/ 3. http://pages.cs.wisc.edu/~jha/jha-papers/security/CSFW_2001.pdf <http://www.google.com/url?q=http%3A%2F%2Fpages.cs.wisc.edu%2F~jha%2Fjha-papers%2Fsecurity%2FCSFW_2001.pdf&sa=D&sntz=1&usg=AFQjCNETnfl6PTlE3qa6gLYvlbyjBnW1HQ> Cheers, Seshika
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture