Creating Bayesian Belief Network Models

A PROCESS FOR CREATING BAYESIAN BELIEF NETWORK MODELS OF SPECIES-ENVIRONMENT RELATIONS

Bruce G. Marcot
Research Wildlife Ecologist
USDA Forest Service, Portland, Oregon

February 16, 1999
For: Terrestrial Staff of the Science Advisory Group ("TSAG"), Interior Columbia Basin Ecosystem Management Project

Following is a potential process for creating Bayesian belief network models of species-environment relations. I tiered this off of our analysis steps document so it's consistent. It presumes we have already screened species for modeling, identified all key environmental correlates for each species, and have access to necessary ancillary data on correlates.

PHASE I. CREATE THE ALPHA-LEVEL MODELS (done internal to TSAG):

A. Diagram the "causal web" of key environmental correlates, at broad and mid scales.
- Use info on KECs from SER database, scribed panel notes, source habitats documents, Croft et al. plant document, Suther et al. documents, EIS viability mitigation documents, and any other sources esp. for plants.
- Use info on species' life history attributes relevant to analysis, from SER database, scribed panel notes, Lehmkuhl's home range size classes for vertebrates, and other lit sources.
- Be consistent with the generalized species influence diagrams (remember, there are separate ones for plants and animals), although specific correlates and linkages may vary in depictions.
B. Develop the initial Netica BBN model.
BBN modeling rules to follow:
1. Keep the number of parent nodes to any given node to three or fewer.
2. Parentless nodes should be those items that can be preprocessed or evaluated from GIS data.
3. Intermediate nodes summarize the major themes denoted in the generalized species influence diagrams.
4. To the extent possible, all nodes should be observable and quantifiable or testable entities. Intermediate nodes may not be so, however, and should be given careful documentation and explanation.
5. Use as few a number of discrete states within any given node as are needed, unless there is a calculation needing continuous values (in which case, discretize those nodes to a fine degree). If SER or other sources provide specific cutoff values of some continuous variable KECs, use those values to define states.
6. Keep the depth of the model - the number of layers of nodes - to four or less, if possible. If not, consider breaking the model up into two networks.
7. Develop a broad-scale model and a mid-scale model simultaneously. cpt's for the mid-scale model can be developed later; at least show the KECs and main factors at both scales.
8. Document the model by using the overall BBN file documentation procedure, and documentation for each node.
Representing proxy nodes:
1. For some parentless nodes, the correlates are not analyzable from the GIS or external ICBEMP data. For those nodes, add a parent node that would be the "broad scale proxy." Proxies should be data that can be derived from the GIS or external ICBEMP data.
2. Showing proxy nodes as separate, parent nodes to the correlate node it represents affords the opportunity to adjust the cpt for the correlate node to represent the degree to which each state of the proxy node represents each state of the correlate node. Such probabilities can be developed by consulting with the GIS/spatial or landscape SAG experts.
Identifying the conditional probability tables (cpts) for each node:
1. Can begin one of two ways:
a. Set all probabilities to uniform, and then "tilt" the probabilities by going to the extreme cases, then the middle or most moderate case, and then back-interpolating all other entries. OR
b. Begin by setting the table to "discrete" and filling in the single, most likely outcome for each combination of parent node states. Then, reconvert the node to "continuous" so those most likely outcomes automatically get set to 100% likelihood and all others to 0%. Next, adjust the likelihoods for each row so as to represent either extreme and mean outcomes, or a reasonable spread of likelihoods across other non-100% cells.
c. Use more advanced techniques of populating cpts (see notes from Haas and others), such as the EM algorithm. However, this may prove to be undoable without empirical data from which to induce at least initial probability values.
2. Remember than not every cell in a cpt has to have a nonzero entry. Some can be 0%.
3. Rows must total 100%
4. Cross-check your work by scanning down each column and asking if the entries with the highest values (the "likelihood function" for each state) really represent the most or more likely conditions for that state. If not, readjust entries.
5. Use all empirical information if available to support this work. Document it.
6. Validate what the model 'does" what it "should" do by compiling the network at the end, and trying different combinations of input values. If it exhibits unrealistic behavior, consider readjusting cpts until it responds reasonably. Consider documenting rationale for this step, particularly with personal experience, publications, etc.

This completes the initial, alpha-level modeling. Label each model individually as Vers. 0.10a, where the "a" refers to alpha-level (for internal use and review only) versions. If these undergo some further tinkering before the next step is done w/ expert review, increase the version number accordingly, such as Vers. 0.11a if only a minor adjustment to cpts or states, or Vers. 0.20a if a more major adjustment to overall model structure and nodes.

BACKUP all model files on a REGULAR basis under separate .zip files and/or directories, and document those backups.

PHASE II. REVISE THE MODELS FROM EXPERT REVIEW (done with consultation with expert reviewers)

Consult with at least 1 species expert to review and potentially revise the structure and cpts for each species BBN model. Creating the alpha-level model and doing the prework in Section I above will save a great deal of meeting time, although it will be necessary to thoroughly review the base data, model structure, and cpts for each species with the expert(s).

- Revise the model structure and/or cpt values according to expert review.
- Document the changes. This can be done in Netica under each node, or overall for the model as a whole.

This completes the major beta-level modeling. Label each model individually as Vers. 1.00. If these undergo some further minor tinkering before they are actually used in the analysis, increase the version number accordingly, such as Vers. 1.10, 1.11, etc. if only a minor adjustment to cpts or states, or Vers. 2.00, etc., if a more major adjustment to overall model structure and nodes.

It is not important that we track and document each and every change to every probability, cpt table, node, and model structure, but that we do have checkpoints and plateaus of model development to save if needed for later reference.

BACKUP all model files on a REGULAR basis under separate .zip files and/or directories, and document those backups.

PHASE III. COORDINATION NEEDS AND WATCH FOR SHORTCUTS

1. Remember to closely coordinate with other staffs of your project.
2. Watch for ways to shortcut the BBN modeling process, in particular, if some species can be grouped into sets of BBN models sharing common model structures. Alternately, some BBN nodes can be saved as libraries for quick copy and pasting into new models.

[Note: the above process document was favorably reviewed by Tim Hass 2/99.]