Hi,

2010/11/1 Srivathsan Srinivas <[email protected]>:
> Dear Ted,
>
> Thanks for pointing to Dirchlet mixture model. I shall look into that.
>
> Basically, I am looking into auto correlation function, Control Charts,
> Moving Average, Population Stability, and Poisson regression (much of the
> data can be described as daily|hourly counts)– I’d like to build a tool that
> would blend these approaches into a scorecard for proactive alerting for any
> outliers...
>
> For the above, I am interested in seeing how the time-series data can be
> broken into manageable segments and distributed-off to different machines in
> a Hadoop network.
>
I've never seen something similar in hadoop, but my suggestion for a
good algorithm for
segmenting time-series is:

Sliding Window And Bottom-Up (SWAB) from Keogh et. al. Here is the paper:

http://www.cs.ucr.edu/~eamonn/icdm-01.pdf

and here a presentation:
www-scf.usc.edu/~selinach/segmentation-slides.pd


> Thanks again,
> Sri.
>
>
> On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <[email protected]> wrote:
>
>> There is nothing explicit in Mahout for this, but you could use the
>> Dirchlet
>> mixture model clustering to do this.
>>
>> The idea would be to express your different observed time series or short
>> segments of time sequences as mixture
>> models and then find regions that are not well described by this mixture
>> model.  Ideally, you would have a Markov
>> model underneath the mixture coefficients, but that is out of scope for
>> what
>> Mahout does for you right off the bat.  It
>> wouldn't be too hard to merge the HMM code and the DP clustering to get
>> this, though.
>>
>> So the answer is no.
>>
>> But Mahout would be a decent substrate for building your own.
>>
>> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas <
>> [email protected]> wrote:
>>
>> > Hi,
>> >       Any pointers to techniques/papers that detect outliers in
>> time-series
>> > of very large data sets using Mahout? I am interesting in seeing what
>> > techniques are favorable for use in large-scale distributed systems using
>> > Hadoop/Mahout.
>> >
>> > Thanks,
>> > Sri.
>> >
>>
>

Reply via email to