Hi, 2010/11/1 Srivathsan Srinivas <[email protected]>: > Dear Ted, > > Thanks for pointing to Dirchlet mixture model. I shall look into that. > > Basically, I am looking into auto correlation function, Control Charts, > Moving Average, Population Stability, and Poisson regression (much of the > data can be described as daily|hourly counts)– I’d like to build a tool that > would blend these approaches into a scorecard for proactive alerting for any > outliers... > > For the above, I am interested in seeing how the time-series data can be > broken into manageable segments and distributed-off to different machines in > a Hadoop network. > I've never seen something similar in hadoop, but my suggestion for a good algorithm for segmenting time-series is:
Sliding Window And Bottom-Up (SWAB) from Keogh et. al. Here is the paper: http://www.cs.ucr.edu/~eamonn/icdm-01.pdf and here a presentation: www-scf.usc.edu/~selinach/segmentation-slides.pd > Thanks again, > Sri. > > > On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <[email protected]> wrote: > >> There is nothing explicit in Mahout for this, but you could use the >> Dirchlet >> mixture model clustering to do this. >> >> The idea would be to express your different observed time series or short >> segments of time sequences as mixture >> models and then find regions that are not well described by this mixture >> model. Ideally, you would have a Markov >> model underneath the mixture coefficients, but that is out of scope for >> what >> Mahout does for you right off the bat. It >> wouldn't be too hard to merge the HMM code and the DP clustering to get >> this, though. >> >> So the answer is no. >> >> But Mahout would be a decent substrate for building your own. >> >> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas < >> [email protected]> wrote: >> >> > Hi, >> > Any pointers to techniques/papers that detect outliers in >> time-series >> > of very large data sets using Mahout? I am interesting in seeing what >> > techniques are favorable for use in large-scale distributed systems using >> > Hadoop/Mahout. >> > >> > Thanks, >> > Sri. >> > >> >
