Yes, the (start, stop] formalism is the easiest way to deal with time dependent data.

Each individual only needs to have sufficient data to describe them, so for if id number 4 is in house 1, their housemate #1 was eaten at time 2, and the were eaten at time 10, the following is sufficient data for that subject:

 id  house time1  time2 status discovered
 4     1        0           2        0         false
 4     1        2          10       1         true

We don't need observations for each intermediate time, only that from 0-2 they were not yet discovered and that from 2-10 they were. The status variable tells whether an interval ended in disaster. Use Surv((time1, time2, status) on the left side of the equation.

Since the time scale is discrete you should technically use method='exact' in a Cox model, but the default Efron approximation will be very close.

Interval censoring isn't necessary. You will have a model of "time to discovery" instead of "time to eaten", but with a fixed examination schedule such as you have there is no information in the data to help you move from one to the other. The standard interval approach would just assume deaths happened at the midpoint between examinations.

Terry T.

On 04/21/2012 05:00 AM, r-help-requ...@r-project.org wrote:
Dear R users,

I fear this is terribly trivial but I'm struggling to get my head around it.

First of all, I'm using the "survival" package in R 2.12.2 on Windows Vista with the 
RExcel plugin. You probably only need to know that I'm using "survival" for this.

I have data collected from 180 or so individuals that were checked 7 times 
throughout a trial with set start and end times. Once the event happens (death 
by predator) there are no more checks for that individual. This means that I 
check on each individual up to 7 times with either an event recorded or the 
final time being censored.

At the moment, I have a data sheet with one observation per individual; that is 
either the event time (the observation time when the individual had had an 
event) or the censored time. However, I'd like to add a time dependent factor 
and I also wonder if this data should be treated as interval censored.

The time dependent factor is like this. The individuals are grouped in "houses" and once one individual in a group has 
an event, it makes biological sense that the rest of them should be at greater risk, as the predator is likely to have discovered 
the others in the "house" as well (the predator is able to consume many individuals). At the moment I'm coding this as 
a normal two level factor (discovered) where all individuals alive after the first event in that house are "TRUE" and 
the first individuals in a house to be eaten are "FALSE". All individuals in houses that were not discovered at al are 
also "FALSE"l. Obviously, all individuals that were eaten, were first discovered, then eaten. However, the first 
individuals in a house to be eaten, had not been previously discovered by the predator (not observably so, anyway).

Should I write up this data set with a start and stop time for every check I 
made so each individual has up to 7 records, one for each time I checked?

Is there a quick and easy way to do this in R or would I have to go through the 
data set manually?

Does coding the "discovered" factor the way I have, make statistical sense?

Should I worry about proportional hazards of the "discovered" factor? It seems 
to me that it would often turn out not proportional because of its nature.

Sorry, lots of stats questions. I don't mind if you don't answer all of these. 
Just knowing how to best feed this data into R would help me no end. The rest I 
can probably glean from the millions of survival analysis books I have lying 
about.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to