Re: outlier detection in time-series using Mahout

Ted Dunning Wed, 03 Nov 2010 19:53:16 -0700

I tried the shapelet approach for video signature generation once upon a
time and was not enormously impressed with the accuracy/recall tradeoffs.


To some degree, I expect that this was partially due to my own deficient
implementation, but I really do think that there may be better approaches
such as vector quantization of a state space of some kind.

On Wed, Nov 3, 2010 at 6:02 PM, Srivathsan Srinivas <
[email protected]> wrote:

> Thanks. I am reading a recent paper of Keogh's = time series shapelets
> : a novel technique that allows accurate, interpretable and fast
> classification.  A springer publication of data mining and knowledge
> discovery, 18 June 2010.
>
> I am basically skimming several papers with different ideas to see
> what can bec easily and efficiently parrallelized for using hadoop...
>
> Thanks much for pointing to the presentation and the paper.
>
> Srinivas.
>
> On Wednesday, November 3, 2010, Federico Castanedo <[email protected]>
> wrote:
> > Hi,
> >
> > 2010/11/1 Srivathsan Srinivas <[email protected]>:
> >> Dear Ted,
> >>
> >> Thanks for pointing to Dirchlet mixture model. I shall look into that.
> >>
> >> Basically, I am looking into auto correlation function, Control Charts,
> >> Moving Average, Population Stability, and Poisson regression (much of
> the
> >> data can be described as daily|hourly counts)– I’d like to build a tool
> that
> >> would blend these approaches into a scorecard for proactive alerting for
> any
> >> outliers...
> >>
> >> For the above, I am interested in seeing how the time-series data can be
> >> broken into manageable segments and distributed-off to different
> machines in
> >> a Hadoop network.
> >>
> > I've never seen something similar in hadoop, but my suggestion for a
> > good algorithm for
> > segmenting time-series is:
> >
> > Sliding Window And Bottom-Up (SWAB) from Keogh et. al. Here is the paper:
> >
> > http://www.cs.ucr.edu/~eamonn/icdm-01.pdf
> >
> > and here a presentation:
> > www-scf.usc.edu/~selinach/segmentation-slides.pd
> >
> >
> >> Thanks again,
> >> Sri.
> >>
> >>
> >> On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <[email protected]>
> wrote:
> >>
> >>> There is nothing explicit in Mahout for this, but you could use the
> >>> Dirchlet
> >>> mixture model clustering to do this.
> >>>
> >>> The idea would be to express your different observed time series or
> short
> >>> segments of time sequences as mixture
> >>> models and then find regions that are not well described by this
> mixture
> >>> model.  Ideally, you would have a Markov
> >>> model underneath the mixture coefficients, but that is out of scope for
> >>> what
> >>> Mahout does for you right off the bat.  It
> >>> wouldn't be too hard to merge the HMM code and the DP clustering to get
> >>> this, though.
> >>>
> >>> So the answer is no.
> >>>
> >>> But Mahout would be a decent substrate for building your own.
> >>>
> >>> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas <
> >>> [email protected]> wrote:
> >>>
> >>> > Hi,
> >>> >       Any pointers to techniques/papers that detect outliers in
> >>> time-series
> >>> > of very large data sets using Mahout? I am interesting in seeing what
> >>> > techniques are favorable for use in large-scale distributed systems
> using
> >>> > Hadoop/Mahout.
> >>> >
> >>> > Thanks,
> >>> > Sri.
> >>> >
> >>>
> >>
> >
>

Re: outlier detection in time-series using Mahout

Reply via email to