Had to make this a separate discussion about the data synth to distinguish it from the Oracle XML DB problems.
If you have access to primary data, e.g. the usual Pat_ID, Timestamp, Code, Value, you can create very simple generative models by training a neural network to generate sequences with similar statistics. In this case, the time parameter, code covariance and code value dynamics (for those that have a value attached to them) are all lumped together. You can set inclusion/exclusion criteria that specify a sub-population and have the neural network be trained and generate data for that specific population. As a practical example, in a pilot study of a few thousand elderly patients, we found that along with dementia, it was inevitable to be getting significant cardiovascular problems and also, a lot of...flu. It's a crude way to be training a network to produce similar sets of codes and values but the dream of specifying patient data as a linear mixture of the profiles of different conditions is a bit far off yet :) Of course, when this is done, we are still left with mapping a given clinical encoding scheme to the suitable openEHR archetype. (If we are looking at a general solution that is). All the best Athanasios On 16/04/2015 10:22, Bert Verhees wrote: > On 16-04-15 11:13, Thomas Beale wrote: >> >> Indeed, it would be a great thing. The reason it doesn't exist so far, >> is that to be useful we need synthesised data sets that have some >> realistic statistical spread of values. Since we are talking at >> multiple levels - not just vital signs measurements, but covariance of >> all kinds of measurements with assessments (diagnosis etc), plans and >> orders and actions, the complexity is not trivial. >> >> A data synthesiser to do this for openEHR would be a fantastic >> Master's project (hint :). > > I use Oxygen, it can generate XML instances to XML Schema's, but first > we need to change the data-element of version to have type Locatable, or > Composition. > If wanted, I can generate them too, it is only one minute work. > > Bert >> >> - thomas >> >> On 16/04/2015 10:02, Dmitry Baranov wrote: >>> Diego, >>> that'll be great. >>> Hope that OpenEHR github owners will provide us with an instance >>> samples repository some day or other :) >>> >>>> I can generate random sample instances from current archetypes for you >>>> if you need them. Generated data may not make much sense as it only >>>> tries to follow the archetype constraints, but it should be enough for >>>> application testing and benchmark >> >> >> >> _______________________________________________ >> openEHR-technical mailing list >> openEHR-technical at lists.openehr.org >> http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org >> > > > _______________________________________________ > openEHR-technical mailing list > openEHR-technical at lists.openehr.org > http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org >