Because mutations in gene regulatory networks underlie many human congenital anomalies [Bard 2007], it follows that developmental toxicants may also produce their adverse effects by altering these same developmental networks. Mouse is the most used mammalian model for understanding the connectivity between genes and human disease and its role is demonstrated by the inclusion of a goal for constructing genetic and physical maps for the mouse genome within the Human Genome Project. Online encyclopaedias are available to support this knowledge exchange. Consider, for example, the Mouse Genome Informatics (MGI) database that provides integrated access to data on the genetics, genomics, and biology of the laboratory mouse. Users may search or browse the database for a Mammalian Phenotype Ontology (MPO) term to view term details and relationships among terms, including links to genotypes annotated with each term or any sub term. The MPO is a structured vocabulary aimed at standardising annotations and describing unambiguous clinical phenotypes in mice using terms derived from ~100 physiological systems, behaviours, developmental phenotypes and survival/aging conditions (Smith et al., 2005). For example, searching the MPO browser using the term <eye> returned 79 MPO terms, including abnormal eye development, abnormal anterior segment morphology, microphthalmia, anophthalmia, and so forth. An important use of text-mining will be to build conceptual network models of interacting genes affiliated with morphogenesis and differentiation of specific structures. Resources such as EMAGE, a curated histological database based on gene expression in mouse embryos, and The Jackson Laboratory’s GDX database, a compendium based on phenotypes, provide resources to identify relevant genes.
To filter linkages that are biologically meaningful we could specify threshold occurrences or use strings that reliably extract developmentally-relevant grammar. For example, CoPub (Frijters et al., 2007, 2008) can be used to calculate keyword over-representations from text-mining of the literature, based on gene-gene co-occurrences. This assumes co-citation of gene + keyword in the same abstract indicates strength of relationships. The CoPub ‘relevance score’ (R-score) describes the strength of a co-association between two keywords given their individual frequencies of occurrence and the number of co-occurrences between every pair in the set. The formula for the raw score is:
S = PAB/PA*PB
where PA is the number of hits from item ‘A’ divided by the total number of PubMed identifiers (PMIDs), PB is the number of hits from item ‘B’ divided by the total number of PMIDs, and PAB is the co-occurrences of items A and B divided by PMIDs. The R-score basically scales these values and transforms them to log10 scale for ranking:
Example: Ontology for early eye development in the mouse
Eye development can be perturbed by genetic mutations and environmental exposures, leading to malformations such as anophthalmia, microphthalmia, coloboma, and cataract. These defects occur in more than a million children worldwide (6.8 per 10,000 live births, ~28,000 annually in the USA). An OVID search of the Medline database revealed specific reference to ocular malformations in 2% of teratology literature in general and 25% of the mouse teratology literature in particular. This implies broad susceptibility of the eye to diverse agents. In the mouse, gestational days 7 to 11 encompass the window of vulnerability to eye reduction malformations such as microphthalmia/anophthalmia, aphakia/aniridia and coloboma. In modelling the formation of a system such as the eye the first step is to lay out its normal pattern graphically. There is a good deal of ontology information already available for this purpose (Baldock et al., 1992; Bard 2007). The Edinburgh Mouse Atlas Project (EMAP) (http://genex.hgu.mrc.ac.uk/intro.html) is mapping successive stages of mouse embryonic development to catalogue gene expression domains. Consider, for example, the EMAP ontology for early eye development over Theiler stage (TS)12 to TS18 (see Figure 3).
Figure 3: Ontology for eye development in the mouse
Annotations are based on the EMAP system (Bard, 2007) over TS 12-18. Prior specification of the optic field is initiated in the anterior neural plate by interactions between ectoderm, mesoderm + endoderm at gastrulation, giving the ectoderm lens-forming ‘competence’. Subsequently, eye development is induced from the neural ectoderm and surface ectoderm.
Descriptive embryology has shown the importance of reciprocal tissue inductions over this period. Although annotating terms with standard ontology IDs carries no molecular data, the currently available gene-expression information associated with a particular developing mouse tissue at a given TS is computationally accessible from the mouse gene expression database, GXD
(http://www.informatics.jax.org/mgihome/GXD/aboutGXD.shtml), through ID interoperability (Bard, 2007).