I tried the shapelet approach for video signature generation once upon a time and was not enormously impressed with the accuracy/recall tradeoffs.
To some degree, I expect that this was partially due to my own deficient implementation, but I really do think that there may be better approaches such as vector quantization of a state space of some kind. On Wed, Nov 3, 2010 at 6:02 PM, Srivathsan Srinivas < [email protected]> wrote: > Thanks. I am reading a recent paper of Keogh's = time series shapelets > : a novel technique that allows accurate, interpretable and fast > classification. A springer publication of data mining and knowledge > discovery, 18 June 2010. > > I am basically skimming several papers with different ideas to see > what can bec easily and efficiently parrallelized for using hadoop... > > Thanks much for pointing to the presentation and the paper. > > Srinivas. > > On Wednesday, November 3, 2010, Federico Castanedo <[email protected]> > wrote: > > Hi, > > > > 2010/11/1 Srivathsan Srinivas <[email protected]>: > >> Dear Ted, > >> > >> Thanks for pointing to Dirchlet mixture model. I shall look into that. > >> > >> Basically, I am looking into auto correlation function, Control Charts, > >> Moving Average, Population Stability, and Poisson regression (much of > the > >> data can be described as daily|hourly counts)– I’d like to build a tool > that > >> would blend these approaches into a scorecard for proactive alerting for > any > >> outliers... > >> > >> For the above, I am interested in seeing how the time-series data can be > >> broken into manageable segments and distributed-off to different > machines in > >> a Hadoop network. > >> > > I've never seen something similar in hadoop, but my suggestion for a > > good algorithm for > > segmenting time-series is: > > > > Sliding Window And Bottom-Up (SWAB) from Keogh et. al. Here is the paper: > > > > http://www.cs.ucr.edu/~eamonn/icdm-01.pdf > > > > and here a presentation: > > www-scf.usc.edu/~selinach/segmentation-slides.pd > > > > > >> Thanks again, > >> Sri. > >> > >> > >> On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <[email protected]> > wrote: > >> > >>> There is nothing explicit in Mahout for this, but you could use the > >>> Dirchlet > >>> mixture model clustering to do this. > >>> > >>> The idea would be to express your different observed time series or > short > >>> segments of time sequences as mixture > >>> models and then find regions that are not well described by this > mixture > >>> model. Ideally, you would have a Markov > >>> model underneath the mixture coefficients, but that is out of scope for > >>> what > >>> Mahout does for you right off the bat. It > >>> wouldn't be too hard to merge the HMM code and the DP clustering to get > >>> this, though. > >>> > >>> So the answer is no. > >>> > >>> But Mahout would be a decent substrate for building your own. > >>> > >>> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas < > >>> [email protected]> wrote: > >>> > >>> > Hi, > >>> > Any pointers to techniques/papers that detect outliers in > >>> time-series > >>> > of very large data sets using Mahout? I am interesting in seeing what > >>> > techniques are favorable for use in large-scale distributed systems > using > >>> > Hadoop/Mahout. > >>> > > >>> > Thanks, > >>> > Sri. > >>> > > >>> > >> > > >
