Hi all,

A Markov Chain is a memoryless, random process which undergoes transitions
from one state to another on a defined state space. Markov chains are quite
useful in outlier detection mechanisms and we are currently in the process
of implementing Outlier detection using Markov Chains in CEP. [1] Gives a
basic understanding of the mathematical properties and proofs behind Markov
Chains.

When using Markov Chains for outlier detection, we need to handle..

1. State Classification
This involves checking for some property(s) in the incoming event and
classifying the event in to a specific state from the state space.

This will be done by using siddhi filters.

2. Compute State Transition Probability Matrix
Once the data is classified in to states, the next step is to calculate the
probabilities of transitioning from one state (si) to another (sj) for all
possible states in the state space, using a training dataset. This
involves, creating and updating a probability matrix of n x n size for a
state space of 'n' states.

This will be implemented using siddhi event tables. Using a training
dataset, we will calculate the probability of transition from si to sj as
the number of transitions from si to sj divided by the total number of
transitions. The event table will store the state transition matrix as a
table which has n x n transitions and their corresponding probabilities.

3. Calculate Outlier Sequence Metrics
Once the state transition probability matrix is established, we can use
that to implement some (of many) possible metrics that are used for outlier
detection.
Some of the initial metrics we will implement are the following

a. Miss Probability Metric
b. Miss Rate Metric
c. Entropy Reduction Metric

Refer [2] for more details on how to calculate the above metrics. In
general, these metrics are calculated by doing simple manipulations to the
probabilities of the probability matrix.

In CEP we achieve this by using simple math functions like sum and basic
operators to compute the metric values of incoming sequences.

4. Compare incoming sequences with threshold values for outlier sequence
metrics.
Once the metric is computed, the last step is to define some threshold
values for the outlier sequence metrics and compare them for each incoming
sequence. This will be achieved in CEP by using siddhi filters.
--------------------------------------------------------------------------------------

Once this is implemented, we can use this for outlier sequence detection in
many usecases. Some of the most obvious ones are ...
1. Fraudulent Patterns in credit card transactions/ bank transactions/
online purchases etc;
2. Bot detection in websites - analyzing click sequences
3. Sequence mining in sensor events (weather patterns, vehicle movement,
etc;)
4. Medical Diagnosis - (eg:- analyzing patterns of ECG timeseries)


Following are some useful resources for background information on Markov
chains and fraud detection

1. http://people.math.aau.dk/~jm/courses/PhD06StocSim/Chapters2.4.5.pdf
2.
https://pkghosh.wordpress.com/2013/10/21/real-time-fraud-detection-with-sequence-mining/
3.  http://pages.cs.wisc.edu/~jha/jha-papers/security/CSFW_2001.pdf
<http://www.google.com/url?q=http%3A%2F%2Fpages.cs.wisc.edu%2F~jha%2Fjha-papers%2Fsecurity%2FCSFW_2001.pdf&sa=D&sntz=1&usg=AFQjCNETnfl6PTlE3qa6gLYvlbyjBnW1HQ>

Cheers,
Seshika
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to