There is nothing explicit in Mahout for this, but you could use the Dirchlet mixture model clustering to do this.
The idea would be to express your different observed time series or short segments of time sequences as mixture models and then find regions that are not well described by this mixture model. Ideally, you would have a Markov model underneath the mixture coefficients, but that is out of scope for what Mahout does for you right off the bat. It wouldn't be too hard to merge the HMM code and the DP clustering to get this, though. So the answer is no. But Mahout would be a decent substrate for building your own. On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas < [email protected]> wrote: > Hi, > Any pointers to techniques/papers that detect outliers in time-series > of very large data sets using Mahout? I am interesting in seeing what > techniques are favorable for use in large-scale distributed systems using > Hadoop/Mahout. > > Thanks, > Sri. >
