We have what appears to be a simple (and very pragmatic) requirement
regarding the incorporation of probability distribution information, rather
than simple point estimates, into a bayesian network.  On first glance we
thought that Gaussian nodes (e.g. "continuous chance" tool in Hugin) would
allow us to address this need.  However, we have had no success and
wondered if this is a general limitation of BBN software and/or whether we
need to re-think our implementation model.

For those who have an interest in this area we provide a simple example to
illustrate the problem below (together with a HUGIN net attached for the
point estimate version of the example).

We have data from a number of experts on a range of disease/sign
interactions, prevalences, etc.  However, to keep the example simple we
will look at a trivial case with just 2 diseases, 5 signs and assume equal
prevalence (i.e. in the absence of any evidence the 2 diseases are equally
likely).

The experts are each asked to estimate how often animals diagnosed as
having a disease presented with various signs.  Using a standard stats
technique (in this case max likelihood) we can create normal probability
distributions to approximate the summed expert response - for our simple
example these are as follows:
                Anaplasmosis            Trypanosomosis  
signs           P(sign|Anapl)  stdev    P(sign|Tryps)  stdev
Anaemia         0.82           0.03     0.77           0.42
Constipation    0.51           0.42     0.05           0.07
Diarrhoea       0.02           0.21     0.14           0.20
Dyspnoea        0.41           0.22     0.22           0.34
Fever           0.73           0.37     0.94           0.21

We can take the point estimates of P and enter these into a 'traditional'
BBN to get predictions of disease diagnosis - see the attached file for the
Hugin implementation (example_to_uai.hkb  - visit the Hugin site if you
need a demo version of the software to run this net  http://www.hugin.com).

We would now like to incorporate the variability of the expert's opinions
in some way - and we naively assumed that Gaussian nodes which incorporated
distribution information would be the solution.  The practical difference
we assume this would make to diagnosis can be illustrated by looking at the
first line in our sample table.  In the absence of any other evidence it is
not surprising that the probabilities of Anapl and Tryps change little
(from their priors of .5, .5) - in the Hugin examples being (0.516 and
0.484 - i.e. effectively still equivalent).  However, what this hides in
the 'thin normal'/'fat normal' issue - i.e. based on the correspondence of
our expert's opinion (as reflected in the stdev of the normal
approximation) - we can be much more confident in the Anapl value ('thin
normal') than in the Tryps value ('fat normal').  Indeed providing the
negative evidence (i.e. Anaemia is not present) is even more interesting
(leads to Anapl=0.44 and Tryps=0.56 in the Hugin example).  The implication
is that Tryps is a (marginally) more likely diagnosis than Anapl.  Taking
the point estimate values this is strictly correct but based on the level
of expert divergence about the Anaemia|Tryps combination we feel this is
counter-intuitive.

I will leave this rather long posting at that and would appreciate any
feedback (however harsh!).  Should anyone wish more detail I am more than
happy to open discussions 'off-line').





Crawford Revie
Department of Information Science
University of Strathclyde, Scotland

------- End of Forwarded Message

Reply via email to