Methods for Peer Review of BBNs

METHODS FOR PEER REVIEW UPDATING OF BAYESIAN BELIEF NETWORK SPECIES MODELS

Bruce G. Marcot
Research Wildlife Ecologist
USDA Forest Service, Portland, Oregon

April 14, 1999

for: Terrestrial Staff, Science Advisory Group, Interior Columbia Basin Ecosystem Management Project

The Terrestrial Staff, Science Advisory Group, Interior Columbia Basin Ecosystem Management Project, is devising Bayesian belief network (BBN) models of plant and wildlife species for use in viability assessments. One vital step in use of such models is peer review. Peer review for these models will entail consultation with species experts and ecologists to ensure that the models correctly depict species' ecologies.

This document summarizes specific methods and steps that can be used in such a peer review of BBN species models.

POTENTIAL STEPS IN PEER REVIEW OF BBN MODELS

I. Introduction to BBN Models and the Species Model

1. Introduce the concepts and general structure of BBN models in general.

- Explain how BBN models can be used to depict logical and causal influences of key environmental correlates on species. Show generalized species inflence diagram (Netica model General Sp ID 1.dne).

- Explain general concepts of probability structures: marginal (unconditional) probabilities of input (parentless) nodes, and conditional probabilities of child nodes. Refer to a simple example in a handout or pre-sent package (e.g., the Hugin apple orchard BBN model example).

- Explain that BBN modeling has a rich literature and a growing use and following in ecological modeling (we can supply some lit cites if needed).

2. Introduce, display, and briefly explain the specific species BBN model to review.

- Explain the goals, objectives, and purpose of the species BBN models: to depict and assess potential effects of broad-scale planning alternatives on coarse-grained environmental factors, and thence on plant and wildlife species' potential response. Potential response is intended to be represented as effects on habitat capacity or habitat density, and relative effects on actual populations in general distributional terms.

- Explain the nodes -- their definitions and general linkages. Explain the "shells" or levels of nodes, being the key environmental correlates (outer shell, parentless nodes), summary nodes, and final population outcome node.

- Compile and run the model -- display how varying inputs affects the outputs.

- Clarify the scale at which the model is to be run (within-LAL) and later summarized (across-LAL).

3. Show the full range of model behavior.

- Show the outermost ranges of input variables and the resultant output.

II. Review of the BBN Species Model Structure in Depth

4. Discuss the overall model structure:

- Are all significant KEC nodes included? Clarify that they are to be correlates that can be indexed at the coarse-grain, broad scale; perhaps show an example of a fine-scale model, such as one of the bat models.

- Are the KEC nodes correctly linked?

- Are states for each node correctly denoted?

- Clarify that changing nodes, connections, or states, will entail redoing conditional probability tables (CPTs), so this should be done only if really necessary.

5. Discuss the probability structure of the model.

Review the specific CTP structures of the model. This entails focusing on the child nodes, not the outermost, input, parentless, KEC nodes.

- Clarify that the CPTs for each node, and the overall model, is intended to be, in some sense, "normalized." That is, the probability structure of each node was "stretched" to fully cover the probability space. And the model as a whole was intended to operate across the full probability space (that is, up to or near to 100% probabilities for different outcomes).

Several methods and tools can be of use in reviewing the probability structure of the model:

5a. Play with the input nodes' states and observe how the model behaves.

Watch how the both the final output node, and any interim summary nodes, change. Note where specific combinations do not make ecological sense.

5b. Open each node and evaluate its probability table.

- Show the extremes of the probability distributions -- "peg the corners." (This is the "normalization" process, by the way, by which the CPT was constructed.)

- Discuss the interior probability values within the "pegged corners." If the CPT table is simple enough, and if the peer reviewer(s) can readily think in terms of likelihood functions, this may be a sufficient way to do the review. (Still, cross-check results by using simple sensitivity tests in Netica, as described below.)

NOTE that reading across each row in a CPT, gives the probabilities (or frequencies, in a Monte Carlo sense) of outcomes for a specific set of input conditions. Reading down each column in a CPT gives the likelihood distribution of input conditions, given a particular outcome. Reading down columns is also a nice way to cross-check the probability structure by scanning for the highest and lowest likelihood conditions for a particular outcome, to look for any anomalies to fix.

5c. Do sensitivity analyses to determine which nodes most influence the outcome.

In Netica, the steps are: [1] compile the model, [2] set all input KEC nodes to uniform probabilities, [3] highlight (click on the title bar of) the final outcome node, [4] additionally highlight (hold down the Control key and left-click on) the input KEC nodes, or whatever nodes you want to test sensitivity for, [5] go to Network / Sensitivity to Findings. A window will open showing sensitivity analyses for each node, with a summary table at the bottom. The simplest results to focus on is the final table showing "entropy reduction" (this is equivalent to variance reduction metrics, but for categorical variables). The affector nodes will be listed in order in decreasing influence on the output or resultant node. Check to see if this relative sequence makes ecological sense.

5d. To update CPTs, can modify individual combinations of conditions by using case files.

- Create a case file (an SCII file named *.cas) of the general form:

IDnum A B ... C

1 state_for_A state_for_B ... state_for_C

Where A, B are the node names of interest; and "state_for_x" is the specific state condition for the corresponding node. A case file should have the resultant outcome (here shown as Node C), which can be any child node in the network (doesn't have to be the final species outcome node). Then you can update a single, or few, rows in the CPT by incorporating such a case example file by [1] compiling the model, [2] going to Relation / Incorp Case File, and specify the .cas file. Accept a weight of "1." (The weight option refers to the influence or effect that this case file should have on updating the probabilities, in the context of empirical updating of Bayesian probabilities.) There will no obvious change, but opening the CPT for the affector node should show a change in the probabilities for those specific state combinations and outcomes.

This is updating of conditional probabilities based on new case information.

You can easily compare before and after probabilities in a CPT the following way. Before updating a CPT, store its probability structure by going to Report / Relation CPT. Then change the node, and again go to Report / Relation CPT and compare the two tables. If you're updating with only a single, or few cases, many of the probability rows will remain unchanged; don't expect all probabilities to change necessarily.

One can also provide a case file with many cases, and individually select which to include by going to Relation / Incorporate Case (but this option does not function in Netica vers. 1.17 for some reason).

5e. Review the most likely conditions for each outcome (of a summary node or the final species outcome node).

Do this by going to Network / Most Probable Expl (most probable explanation). Then click on any node's state twice. The network will equilibrate to new probability bars; these now are not probabilities per se. Focus on those bars that reach the 100% levels. These are the most likely states for each node, particularly if you click on each state of the outcome node of interest. Check to see if these most probable states make sense for each outcome. Note which don't and which need to be changed; modify the CPT for those nodes accordingly (by using case file updating of CPTs or manual updating of CTPs).

5f. Check the current CPT structure for selected nodes, against a case file listing states and outcomes.

This determines the "error rate" (confusion table, showing classification errors) of the existing CPT, based on new (presumable more correct) examples of outcomes.

Do this by [1] creating a .cas case file of the new, correct example outcomes; [2] compiling a fully-specified BBN model and highlighting the nodes included in the case file; [3] going to Network / Test Using Cases, and specify the .cas case file. A window will pop up showing, for each node, a confusion (error rate) table, and other information on scoring, calibration, and sensitivity (other measures of errors). This does NOT update the CPT, but is helpful to diagnose when CPTs may need to be updated. To update the CPTs from the case file, see step 5d above.

III. Save the Results!

Remember to save the new, updated species BBN model under a new model version number; and update the information stored in the Window / Description box (when updated, who updated, new version number, etc.).

Some references on validating Bayesian belief network and knowledge-based models:

Denning, P. J. 1989. Bayesian learning. American Scientist 77:216-218.

Haas, T. C. 1991. Partial validation of Bayesian belief network advisory systems. AI Applications 5(4):59-71.

Kort, B. 1990. Networks and learning. AI Magazine 11(3):16-19.

Pazzani, M., and D. Kibler. 1992. The role of prior knowledge in inductive learning. Machine Learning 9:54-97.

Preece, A. D. 1994. Validation and verification of knowledge-based systems. AI Magazine 15(1):65-66.

Reinhardt, E. D., A. H. Wright, and D. H. Jackson. 1992. Development and validation of a knowledge-based system to design fire prescriptions. AI Applications 6(4):3-14.

Sequeira, R. A., J. L. Willers, and R. L. Olson. 1996. Validation of a deterministic model-based decision support system. AI Applications 10(1):25-40.

The peer review procedures listed in this document are generally consistent with these references, in that updating (Bayesian "learning") of probability structures is based on new information (ideally, from empirical studies, but also can be from expert review).