RE: pattern of cohort characterization needs? RE: Empirical Data Dictionary

Dan Connolly Mon, 06 Oct 2014 12:17:42 -0700

Nice.

For bonus points, update the obesity data elements ticket 
(#33<https://informatics.gpcnetwork.org/trac/Project/ticket/33>) to note this 
pattern in general and the obesity aspects in particular. For sensitive 
material, put it in the KUMC REDCap project I recently invited you to:


  *   GPC Cohort Characterization 
Work<https://redcap.kumc.edu/redcap_v5.7.7/ProjectSetup/index.php?pid=3560>

and point to it from #33

--
Dan

________________________________
From: gpc-dev-boun...@listserv.kumc.edu [gpc-dev-boun...@listserv.kumc.edu] on 
behalf of Alex Bokov [bo...@uthscsa.edu]
Sent: Monday, October 06, 2014 1:45 PM
To: gpc-dev@listserv.kumc.edu
Subject: Re: pattern of cohort characterization needs? RE: Empirical Data 
Dictionary

I don't claim this covers all cases, only the ones I can think of so far. If 
anyone can think of a cohort characterization question that cannot be answered 
by the below procedure, I am interested in learning about it.

On 10/03/2014 04:56 PM, Dan Connolly wrote:

"most cohort characterization needs seem to
follow the same basic pattern"

What pattern is that?
Preliminary Cohort Characterization
1. Elicit from the domain experts minimal criteria for membership of a patient 
in the cohort of interest (i.e. cast a wide net)
2. Elicit from the domain experts facts at the visit level, that are of 
interest about those patients
3. Pull down all available demographic data for that patient set
4. Left join the above to a column containing the total visit count for each 
patient broken up by year
5. For each fact from #2 join an additional column with the visit count for 
each patient  You now have one row for each year each patient is in the system, 
with a separate column for each static value for that patient, a column for the 
total number of visits they had that year, and an additional column for each 
subset of those visits your domain experts flagged as possibly interesting.
6a. For deliverables asking for the number of distinct patients meeting a 
certain criterion, COUNT all the visit counts grouping by every demographic 
variable of interest and year.
6b. For deliverables asking for the number of distinct visits meeting a certain 
criterion, SUM all the visit counts grouping by every demographic variable of 
interest and year.
6c. If you want totals over all years in the system, for visits just SUM up the 
years. For patients, SELECT DISTINCT patients, demographic variables, and 
indicator variables for whether the number of visits in each category is 0 or 1 
(omit years this time). Or, do #5 but omit year in the first place.
7. Filter OBSERVATION_FACT by membership of PATIENT_NUM in the patient-set from 
#1 and then do a count of visits and/or of patients for each CONCEPT_CD 
(filtered in a domain-appropriate manner on MODIFIER_CD).

6a and 6b Tell you whether it's feasible to require that certain observations 
be present for each visit or each patient (i.e. that if you did that, your 
inclusion criteria would not so strict that you'd up with an insufficient 
sample size). They also give you an idea of your cohort's demographic makeup 
and how/if it has changed over time.

7 tells you what the most common facts are for this preliminary cohort, even if 
they were not singled out by the domain experts. In consultation with them,  
additional selection criteria might be drawn.

Refinements of Cohort Characterization
1. Optionally tighten the membership criteria (e.g. in our initial 
characterization it looks like most patients who have one XYZ measurement on 
file have half a dozen of them, so might as well make that the floor) and 
optionally limit the time range (e.g. initial characterization indicates we 
have large samples available between 2010-2013, so let's use only those years 
to begin with).
2. Optionally revise the visit-level features of interest (e.g. A procedure 
hardly ever gets ordered? Omit it this time. A drug you weren't aware of turns 
out to be prescribed to 30% of the patients? Dedicate a new column to it.). 
This may be the place to put in complex temporal queries so you aren't grinding 
the server on a huge dataset needlessly.
3. Are some of the original demographic variables too sparse for this cohort, 
or not used at all? Optionally omit them.
4-7. As above.

Repeat as necessary (I expect one iteration to be enough in many cases) until 
the clinicians and informaticians converge on a patient-set and visit-set of 
adequate size and relevant to the clinical problem of interest.

You'll notice that there is variability from study to study in two places:
A. The 'WHERE' clause for selecting the patient-set.
B. The 'WHERE' clause in each variable column.
Everything else is looking like it could be factored out into a generic query 
or procedure.

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

RE: pattern of cohort characterization needs? RE: Empirical Data Dictionary

Reply via email to