Dear List(s),

Last week I have posted a request for help pertaining to the issue
of how to analyse repeated measurement data with some rather unusual
dimensions (N=120 subjects giving a time series with T=120 occassions
each). To my pleasure, there have been as much as 12 replies which are
given below. Beforehand, the original posting is given.

For those of you who don't want to read through all of these, here is
a short summary.


Roughly, responses/recommendations can be classified into 5 categories:


 a) Make use of all information available and put them into a form
    suitable for state-space models or vector-ARIMA-analyses.

    Frankly: this seems to go beyond my capabilities, and I am 
    inclined to take the penalty of reducing the data as pro-
    posed by other responders.


 b) Perform time series analyses for each subject and boil the
    data down to certain parameters. Read these into a secondary
    data set and merge it somehow with the original one in order to 
    preserve design variables like treat/control, sex and age or the 
    like. Finally, use these data for "standard" analyses to test for 
    hypotheses of homogeneity of subjects within groups or differences
    across groups (implying some MANOVA-style model).

    Going to extremes, one could obtain a single parameter like a slope
    for each subject and perform univariate analyses with regard to
    higher stratum levels.

    Another response in this direction suggested fitting a spline
    model for each subject and use spline components for subsequent
    ANOVA-style models. I understand that this could be done using
    proc transreg in SAS, but I am not sure whether this procedure does
    in fact account for the time dependency in the individual data giving
    the spline.


 c) Reduce the repeated measurement frequency in the first place and
    then perform (M)ANOVA-style analyses with a time factor of (then)
    suitable level count. Test for time effects using standard contrast
    like polynomial decomposition or helmert coefficients (when interest
    lies with the point in time when responses cease to change any 
    further).


 d) A particuarly interesting response suggested identifying "change 
    profiles" within time series and submitting these to further analyses 
    like permutation test. Still, I am unclear about how to aggregate data
    in order to make best use of all subjects' data.


 e) General remarks and caveats like paying regard to sample size issues,
    looking for cyclicity in individual data that generalze to the stratum,
    adjusting for cross-correlations in case of multi-variable outcome
    measures, and the complexity of assumptions required when analysing
    complex factorial designs involving a repeated measures factor.



Again, thanks to all who took their time to help. I am committed to parti-
cipate in this way of mutual assistance.

Hans C Waldmann



---------------------------------------------------------------------
Dr. Hans C Waldmann              
Methodology & Applied Statistics in Psychology & the Health Sciences

ZFRF / University of Bremen / Grazer Str 6 / 28359 Bremen / Germany 
[EMAIL PROTECTED] / http://samson.fire.uni-bremen.de

friend of: AIX PERL ADABAS SAS TEX 
---------------------------------------------------------------------









Following:
             0) Original posting (request)
          1-12) replies




0)
------------------------------------------------------------------------


----- Original Message -----
From: "Dr. Hans-Christian Waldmann" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, October 12, 2000 2:18 PM
Subject: a model for time series (T=120) for N=120 persons ?


>
>
> Hello everybody,
>
> in one of the clinical projects we consult on data analysis, I am
> facing a problem I have not yet come across and that leaves me with no
> idea on how to proceed. The problem pertains to the dimension of
> the outcome data set. In a repeated measures design, let N be the
> number of people treated and T be the number of measurement occassions.
>
> I understand that N=1 (or _some_) more and T=120 would make up a time
> series, and that I am supposed to fit ARIMA-MOdels or Transfer functions.
> I could detect effects by structural breaks around the point of time of
> intervention, that is: performing intervention analyses as proposed in
> McDowell, McCleary, Meidinger and Hay, 1980, Interrupted time series
> analysis, or other books on how to analyse data from single subject
> designs.
> Allright.
>
> I understand that N=120 (or any number more) and repeated measures like
> 2<=T<="the-smaller-the-better" would make up a dataset suitable for
> an ANOVA approach or mixed models using special covariance structures
> like SAS's proc mixed. I know how to do that.
> Allright.
>
> I understand that for each of this variants there are some alternatives
> in statistical modeling (like non-parametric analyses etc.).
>
> Now, what am I supposed to do with data from a design giving a T=120
> time series for _each_ of 120 subjects ? There has been a controlled
> study where patients in three independent groups were asked to keep
> a diary on some outcome variables for ca. 4 months. There are some
> design variables like treat/control or sex and age that are expected
> to contribute systematically to variation between outcome measures.
> But this outcome measure apparently is a time series. I don't think
> I should perform an ANOVA-style analysis with a 120-level time factor.
> Pooling data and performing ARIMA/transfer-functions on a single time
> series of subjects' means for each point in time doesn't make sense
> either, assuming that subjects differ in both measurement level and
> covariance structure of their individual time series. I admit that
> I have no idea how to evaluate, say, an effect of treatment on this
> kind of outcome measure.
>
> Does anybody else have an idea ? I promise to post a summary of res-
> ponses to the list.
>
>
> Thanks in advance
>
> Hans-Christian Waldmann
>
>
>



1.
-----------------------------------------------------------------------

>From [EMAIL PROTECTED] Thu Oct 12 15:34:49 2000


Hi

It may be easier in the long run to pose the time series in state space
form.  Then missing values are easy to deal with for a start and it can be
easier to model what goes on.

See papers by Durbin and Koopman, and software from Koopman (Ssfpack)

http://www.econ.vu.nl/koopman/ssfpack/

This does all the hard work to leave you to concentrate on modelling.  I
think there may be a book coming out soon from Oxford University Press

http://www.oup.co.uk/

Hope this helps.  I intend to have a go with this software when I can.

Robert West




2.
-----------------------------------------------------------------------

>From [EMAIL PROTECTED] Thu Oct 12 16:25:09 2000
Status: RO

Dear Hans

The obvious way to proceed would be to analyse each of the 120 time 
series in some appropriate way, and then use the derived parameters 
in further analysis of the experiment overall.

In a simple example, where the derived parameter is, say, a slope 
from Linear Regression, the slope estimates can then be used as the 
response in an ANOVA or Multiple Regression analysis.  

Your time series can be de-constructed in a suitable way, perhaps 
using a breakpoint detection method, some ARIMA model parameter, or 
even a more complex method such as describing the time series curve 
using principle components.

Ones you have the derived parameters, these can then be modelled.

You should end up with model/s that determine or predict the 
effect of each of the experimental factors such as 
treatment/control, age, sex, group type (in combination) on all the 
derived parameters.  This should, in turn, lead to some standard 
values of the derived parameters that occur  under particular 
combinations of the factor settings.

Of course the simpler the derived parameters the better, and one must 
take care of correlations between these when establishing levels of
uncertainty around estimates of the 'standard' values. 

Where the effect of factors on the parameters conflict in some way, 
if this is possible, joint optimisation methods can be utilised.

The modelling techniques will probably involve response surface 
analysis.

So there are three stages:

Time series analysis - to find parameters
Parameter modelling - to find factor effects and predictive equations
Joint optimisation? - to resolve prediction-effect conflicts

Regards
Dave Stewardson



3.
-----------------------------------------------------------------------


>From [EMAIL PROTECTED] Thu Oct 12 17:10:35 2000

Hi,

This sort of things occurs quite often in clinical trials when you have
diary data. The subject is on treatment for 12 weeks say and records
their lung function (say) every day just after they get up.

N=200 to 500 say and T=12*7=84

The crucial things is to get the client (medic) to state what it is that
interests him. Then an easy approach is to do a two stage analysis.
Within each individual analyse the data to produce a single summary
value. Could be mean, slope, maximum, minimum, area under the curve
(average), time when value drops to 50% of baseline. etc, etc. Which
summary statistic you choose is determined by what question the client
wants to ask of the data.

Then analyse these summary values across subjects at the higher stratum
level.

For regular data many of these analyses are special cases of mixed
models. (e.g. taking regression slope and analysing at higher level is
exactly the same as random-coefficient regression in mixed models when
the X values are the same for every subject.)

But first of all - plot the data!

James.



4.
-----------------------------------------------------------------------

From: "Gaj Vidmar" <[EMAIL PROTECTED]>

What I can propose is rather simple, so it may well be completely wrong
(especially as no true expert has posted anything on the topic so far), but
perhaps it will be of some use:

why not pool data for an individual over time-periods - say, months, or to
preserve more information, weeks? (Perhaps not by averaging, but - depending
on data chracteristics - using median, geometric mean, or some fancy
M-estimator?)

- This will give you the possibility to conduct an ANOVA-type analysis -
mixed model with some "nonrepeated" factors (three fixed, if I get it right,
i.e., treat/control, sex and age, plus eventual others) and week (or
whatever time-period) as "repeated".

As emphasised in the introduction, this may be less than two cents.

Best regards,

Gaj Vidmar
Univ. of Ljubljana, Dept. of Psychology



5.
-----------------------------------------------------------------------

From: "Gaj Vidmar" <[EMAIL PROTECTED]>

Dr. Waldman,

there seems to be no word from professional statisticians yet, so here's an
addenum.

Namely, I have overlooked two important aspects of the study; which,
hovever, doesn't invalidate the basic idea of pooling individual data over
appropriate time-periods.

The first aspect are the three groups of patients. - I'm not sure whether
they were formed on the basis of the (quote) design variables (in which case
there is one factor instead of the three nonrepeated ones), or they define
another factor (a rondom one, I guess, as opposed to the three fixed ones),
but the pooling approach is independent on this fact.

The same goes for the second aspect, i.e., that there were several measures
taken, not just one. Theoretically, MANOVA might thus be feasible instead of
several ANOVAs. But with such a complex model (say, one random plus three
fixed factors plus one repeated-measures factor) properly checking all the
various assumptions and interpreting all the results is rather ... Not to
mention that the analysis must be properly set up in the first place
(contrasts issues ...), as well as sample size issues ... At least, fully
understanding such an analysis is probably beyond the horizon of the
majority of the "consumers" in social/health sciences, to which you will
presumably have to present the findings. So if the outcome variables are not
too many and/or they are not too correlated, I believe they can be analysed
"one by one".

Awaiting judgement from the sci.stat.* community and wishing you all the
best with the research,

Gaj Vidmar





6.
-----------------------------------------------------------------------

From: MJ Ray <[EMAIL PROTECTED]>

My own suggestion (mangled by a bad emailer) was to use vector time
series methods, but this could lead to a fairly large computation
probelm without extra information.  I wasn't able to recommend a very
good specialist reference off the top of my head, though.

MJR



7.
-----------------------------------------------------------------------

From: Elliot Cramer <[EMAIL PROTECTED]>


you havn't really given enough information but here is a suggestion.  you
have three separate groups.  If they are not the treatment groups with
random assignment, anything else you do will be VERY dubious.  You could
use sex as a blocking factor and age as a covariate.  What is the purpose
of the 120 observations? You could construct a SMALL number of relevant
variables from these observations and do a MANOVA, for example linear,
quadratic and cubic trends if you are simply interested in what happens
over time.  You might also do a between groups analysis on the final time
or average time.  It's hard to say without knowing the details.

What your REALLY should do is consult a statistician about the specifics.



8.
-----------------------------------------------------------------------

From: [EMAIL PROTECTED] (Magill, Brett)

I don't know enough about time series really to provide much advice.
However, I have seen methods by which a slope was calculated across time for
each subject with the first measurement as the incercept (within subjects).
Subsequently, the individual slope was regressed on other factors.  Thus,
answering the question what factors (X) influence the rate of
change/direction across time in Y.



9.
-----------------------------------------------------------------------

From: [EMAIL PROTECTED] (Simon, Steve, PhD)

Even though the researchers collected data on 120 consecutive days, I doubt
that they are particularly interested in any one day in isolation. Look at
some composite measures, such as the slope of the trend line, or the change
score at the end of each month. Or perhaps an average for each month, or the
standard deviation for each month.  Your researchers should be able to 
elaborate on why they collected the data, and that elaboration should help 
you decide which composite measure you should use.

Once you reduce it to a small number of composite measures, then you can
apply the ANOVA types of procedures.

An alternative that might be worth exploring is fitting a spline model to
each subject's data and then pooling the splines across groups. This is
messy and complex, but fun.

I hope this helps. Good luck!



10.
-----------------------------------------------------------------------

From: Rich Ulrich <[EMAIL PROTECTED]>


Steve tells how to make the best of the data, making the likely
assumptions about the 120 days -

You don't say where the 120 days exist, so it might be that the are
paycheck cycles of 7, 14, 28 days, or a month; or menstrual cycles, or
some other.   If the subjects have some overlapping '120 days' on the
calendar, it might be reasonable to look at calendar-date for cycles,
or for extreme events.  That's assuming, there is a bit of day-to-day
lability that might cover up some information.

But if you aren't looking at (say) muggings on the day after Social
Security Checks appear, then I doubt that cycles are likely.  Still,
the detail does allow you to exam on-set  variations, or off-set --
That is, there might be a definite curve over the first week or so
that does not exist later, if the ratings are something that entail
learning or adaption.  - This could be something interesting if it
varies among the three groups, or it could be something to be
eradicated because it is artifact.

On the other hand, if the 120-days was known to be a limit, there
might be some  'anticipation of the end' -- for instance, patients in
hospitals may show remarkable recovery  during the last week of the
insurance coverage.  

So, you can probably lump data by weeks or months, but don't forget to
take a look at start- and end-effects.  If the measures have those
hazards.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html



11.
-----------------------------------------------------------------------

From: MJR       http://stats.mth.uea.ac.uk/


Panic.

Seriously, you need to be looking at multivariate or vector time
series methods in this case.  Unfortunately, without adding more
assumptions (hopefully reasonable ones created from expert opinion) to
the model, you are looking at a fairly large computation problem (many
cross-correlations) at least, I think.  I shan't say more for fear of
making a fool of myself at this time of the morning, but suggest the
weighty "Time Series Analysis" by Hamilton as a possible lead.


12.
-----------------------------------------------------------------------

From: David Carr


You are right that pooling the 120 people into one time series is 
not the correct solution (average behaviour is such a context is 
close to meaningless----a former colleague of mine, Dr. Wolfgang 
Keeser showed this several years ago, with time series of 
cigarettes smoked for people being treated to give up smoking).

I think one of the better analysis strategies would to model for 
each person  an intervention (interrupted) time series model 
made popular by Box and Tiao. With the resulting parameters one 
could classify patients to those with no change, only a transient 
change, and those with a permanent change (either positive or 
negative). With this classification one could then look into the 
possible influence of other covariates.

As far as the time series modelling goes, I would be glad to help 
you out, if you happen to have any funding.

If on the other hand, you need to do it yourself, I could send you a 
bibliography (or part thereof) of mine on time series topics (some 
1500 references in all).

Look forward to your reply.



12.
-----------------------------------------------------------------------

From: [EMAIL PROTECTED]


Lieber Hans (ich glaube es ist angebracht auf deutsch zu antworten),
ich hatte vor ein paar Monaten mit einem ähnlichen Problem zu tun, als wir
versuchten die Effekte
von verschiedenen Reizen auf eine physiologische Response Variable (RR
Intervalle, Pulsamplituden etc.) zu quantifizieren.
Wir wussten a priori so gut wie nichts, weder wie stark die Effekte der
einzenen Reize seien, noch wann nach dem Reiz der Effekt messbar ist.
Deshalb wurde pro Patient in sehr kurzen Abständen gemessen. Wir hatten es
somit mit einer Menge längererer Zeitreihen zu tun.
Nach langwieriger  Literatursuche bin ich im Biometrics auf eine
interessante Arbeit eines ungarischen Kollegen gestossen,
der ein nichtparametrisches Verfahren zur Analyse von RM Problemen
entwickelt hat:
J.Reiczigel: Analysis of Experimental Data with Repeated Measurements.
Biometrics 1999; Vol 55.,No.4:P 1059-1063.

Kurz: die Methode sucht in den einzelnen Verlaufsprofilen Abschnitte von k
konsekutiven Punkten, die sich vom Rest deutlich unterscheiden (sog.Top
Periods). Mittels eines Permutationstests wird überprüft, ob das Auftreten
dieser Perioden systematisch (Effekt) oder zufällig ist.

Ich habe mit dem Autor Kontakt aufgenommen, dieser stellte mir
freundlicherweise seine selbst entwickelten S-PLUS Programme zur Verfügung.
Weiters haben wir den Datensatz gemeinsam (!) analysiert. Es bietet sich an
mit dem Autor Kontakt aufzunehmen, da er sehr an RM Problemen interessiert
ist, vor allem am Verhalten seiner Methode an realen Daten.

Seine Email Adresse lautet: [EMAIL PROTECTED]

Wenn Du Fragen hast mail bitte zurück.

Ich hoffe das hilft Dir weiter

Liebe Gruesse aus Graz
Bernd

 ------------------------------ STOP ---------------------- 


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to