Hi,

First of all, let me stress I'm not actually trying to do quant analysis...
it's just for fun, not pratical use is expected, other than learning some
new stuff.

I also thought of using a transform from time to frequency (fourrier...) but
it was only a wild guess based on my limited knoweldge of electronics and
signal processing where the usual answer to a complex signal analysis is "do
a fourier transform, it will help" :)

What makes you think that Gabor would help? Because of phase shifting? I
would then basically be clustering my data by phase shifting, is that right
?

Thanks for your help!

Florent




2010/7/15 Ted Dunning <ted.dunn...@gmail.com>

> Clustering of time series data is usually better done in an abstract
> relatively low dimensional coordinate space based on some transform like a
> locality sensitive frequency transform.  Gabor transforms might be
> appropriate.
>
> You might be able to get away with something like an SVD of your daily
> change data.
>
> On Thu, Jul 15, 2010 at 7:51 AM, Florent Empis <florent.em...@gmail.com
> >wrote:
>
> > Hi,
> >
> > I want to learn more on clustering techniques. I have skimmed through
> > Programming Collective Intelligence and Mahout in Action in the past but
> I
> > don't have them on hand at the moment... :(
> > I've seen Isabel Drost mail about test data on http://mldata.org/about/
> > I've had an idea of using
> http://mldata.org/repository/view/stockvalues/for
> > a pet project.
> > My idea is as follow: can we see a common behaviour between companies'
> > stock
> > value?
> > I would expect ending up with cluster of banking sector shares, utilities
> > share, media etc... and maybe some more unexpected cluster, who knows?
> >
> > My idea is basically:
> > 1°)Transform the dataset from values to daily variation as percentage
> > drop/raise (data is then normalized)
> > 2°)Apply clustering technique(s)
> >
> > The issue may seem silly but as I understand it, clustering happens in a
> 2
> > (or more) dimension space.
> > I know I have 2 dimensions: variation and time, but I can't wrap my head
> on
> > the problem...
> >
> > I *think* that the K-Means example does exactly what I intend to do my
> > second step, is this correct?
> > However, I can grasp what the 2 dimensional display represent exactly:
> what
> > are the x and y axis ?
> >
> > Added question: I am fairly new to the M/R paradigm, but let's say I
> would
> > like to do step 1 (data normalization) in a M/R fashion. Would the
> > following
> > be a good idea:
> > My data is a matrix of k stock values S in n intervals of time.
> > I call the first stock in the file, first and second period:
> > S1,t & S1,t+1 ...
> >
> > Map Step: input: ((S1,t ... S1,t+n),... ,(Sk,t ... Sk,t+n) )
> > output (( (S1,t;S1,t+1),...,(S1,t+n-1;S1,t+n)), ... ,(
> > (Sk,t;Sk,t+1),...,(Sk,t+n-1;Sk,t+n)) )
> > Reduce Step:
> > ( (%S1,t+1.....%S1,t+n), ...,(%S1,t+1.....%S1,t+n))
> >
> > I apologize for my beginner's questions but.... everyone has to start
> > somewhere :-)
> >
> > BR,
> >
> > Florent Empis
> >
>

Reply via email to