Nice. For bonus points, update the obesity data elements ticket (#33<https://informatics.gpcnetwork.org/trac/Project/ticket/33>) to note this pattern in general and the obesity aspects in particular. For sensitive material, put it in the KUMC REDCap project I recently invited you to:
* GPC Cohort Characterization Work<https://redcap.kumc.edu/redcap_v5.7.7/ProjectSetup/index.php?pid=3560> and point to it from #33 -- Dan ________________________________ From: gpc-dev-boun...@listserv.kumc.edu [gpc-dev-boun...@listserv.kumc.edu] on behalf of Alex Bokov [bo...@uthscsa.edu] Sent: Monday, October 06, 2014 1:45 PM To: gpc-dev@listserv.kumc.edu Subject: Re: pattern of cohort characterization needs? RE: Empirical Data Dictionary I don't claim this covers all cases, only the ones I can think of so far. If anyone can think of a cohort characterization question that cannot be answered by the below procedure, I am interested in learning about it. On 10/03/2014 04:56 PM, Dan Connolly wrote: "most cohort characterization needs seem to follow the same basic pattern" What pattern is that? Preliminary Cohort Characterization 1. Elicit from the domain experts minimal criteria for membership of a patient in the cohort of interest (i.e. cast a wide net) 2. Elicit from the domain experts facts at the visit level, that are of interest about those patients 3. Pull down all available demographic data for that patient set 4. Left join the above to a column containing the total visit count for each patient broken up by year 5. For each fact from #2 join an additional column with the visit count for each patient You now have one row for each year each patient is in the system, with a separate column for each static value for that patient, a column for the total number of visits they had that year, and an additional column for each subset of those visits your domain experts flagged as possibly interesting. 6a. For deliverables asking for the number of distinct patients meeting a certain criterion, COUNT all the visit counts grouping by every demographic variable of interest and year. 6b. For deliverables asking for the number of distinct visits meeting a certain criterion, SUM all the visit counts grouping by every demographic variable of interest and year. 6c. If you want totals over all years in the system, for visits just SUM up the years. For patients, SELECT DISTINCT patients, demographic variables, and indicator variables for whether the number of visits in each category is 0 or 1 (omit years this time). Or, do #5 but omit year in the first place. 7. Filter OBSERVATION_FACT by membership of PATIENT_NUM in the patient-set from #1 and then do a count of visits and/or of patients for each CONCEPT_CD (filtered in a domain-appropriate manner on MODIFIER_CD). 6a and 6b Tell you whether it's feasible to require that certain observations be present for each visit or each patient (i.e. that if you did that, your inclusion criteria would not so strict that you'd up with an insufficient sample size). They also give you an idea of your cohort's demographic makeup and how/if it has changed over time. 7 tells you what the most common facts are for this preliminary cohort, even if they were not singled out by the domain experts. In consultation with them, additional selection criteria might be drawn. Refinements of Cohort Characterization 1. Optionally tighten the membership criteria (e.g. in our initial characterization it looks like most patients who have one XYZ measurement on file have half a dozen of them, so might as well make that the floor) and optionally limit the time range (e.g. initial characterization indicates we have large samples available between 2010-2013, so let's use only those years to begin with). 2. Optionally revise the visit-level features of interest (e.g. A procedure hardly ever gets ordered? Omit it this time. A drug you weren't aware of turns out to be prescribed to 30% of the patients? Dedicate a new column to it.). This may be the place to put in complex temporal queries so you aren't grinding the server on a huge dataset needlessly. 3. Are some of the original demographic variables too sparse for this cohort, or not used at all? Optionally omit them. 4-7. As above. Repeat as necessary (I expect one iteration to be enough in many cases) until the clinicians and informaticians converge on a patient-set and visit-set of adequate size and relevant to the clinical problem of interest. You'll notice that there is variability from study to study in two places: A. The 'WHERE' clause for selecting the patient-set. B. The 'WHERE' clause in each variable column. Everything else is looking like it could be factored out into a generic query or procedure.
_______________________________________________ Gpc-dev mailing list Gpc-dev@listserv.kumc.edu http://listserv.kumc.edu/mailman/listinfo/gpc-dev