Special Report 19


Ontologies are used in biology as a way to classify terms and their relationships to broader concepts and their interrelationships. Once these concepts and their relationships have been formally defined, new relationships between concepts may emerge, and classifying one concept as a type or subclass of another becomes possible. Formally, concepts are generally called “classes”; relationships are called “relationships”. Generally, ontologies operate as a system of triples. Triples consist of a subject-predicate-object. The subject and objects are classes, while the predicate is the relationship that connects them.

For example, consider a pizza ontology. Within this ontology, there is a class called ‘pizza’, defined as a thing with a crust and toppings (note that sauce is optional, as there are some pizzas which lack sauce, such as white pizzas). ‘Toppings’ has three subclasses: (1) meat, (2) vegetable, and (3) cheese. There is also a subclass of pizza called a ‘vegetarian_pizza’, which is defined as a pizza with vegetable toppings, no meat toppings, and it may or may not have cheese toppings. Thus, we could develop a specific instance of vegetarian_pizza from Joe’s Pizza Shack called, “Veggie Supreme.” In subject, predicate, object form, we would have “veggie_supreme is_a vegetarian_pizza”. Here the subject is “veggie_supreme”, “is_a” is the predicate, and the object is “vegetarian_pizza.” An example of a developmental biology illustration of a triple would be an increase in retinoic acid level (subject) enhances (predicate) cell differentiation (object), or in AOP general terms, KEx (subject) leads to (predicate) change in KEx+1 (object).

An ontology will allow scientists to begin to ask questions. For instance, we could identify the assays associated with the minimal suite of KEs within an AOP that are sufficient to infer an adverse outcome with high confidence. We could also consider a set of parameters, such as the gestational age at exposure and a series of high-throughput screening data, and query the ontology to identify potential adverse outcomes for chemical screening decisions.

Having the data encoded in an ontology also makes it easy to store and manage. Data can be obtained from various sources, including that already encoded in other ontologies, and easily encoded into the developmental toxicity ontology. In some instances this may require parsing the data and re-encoding it. In other instances it may be as easy as a simple import. Once the data are encoded, it can be easily queried and analysed using a number of freely available or commercial, off-the-shelf tools. A number of standards exist for querying data within ontologies built upon existing standards, such as the Web Ontology Language (OWL) for encoding the ontology and its associated data, and SPARQL (SPARQL Protocol and RDF Query Language) for querying the data within the ontology.

The ontology can be stored in an RDF (Resource Description Framework) database. The same RDF database can be populated with data from biological assays and chemical assays such as ToxCast or Connectivity Map. If the data are entered following prescribed ontologies, the relationship between chemical activity and perturbation of development can be predicted or captured. To continue with the pizza metaphor: if a chemical has the effect of disrupting meat production, a pizza normally covered with meat might become a vegetarian pizza. Ideally, the reduction in meat and its relationship to the phenotype of the pizza could be expressed in quantitative terms. When the ontology and the chemical perturbation data are stored appropriately, SPARQL queries should be able to reveal phenotypic outcomes like this one. To move the discussion into more relevant space, let us suppose the developmental ontology links palate growth to retinoic acid (RA) signalling. The RDF triple store will contain the connections between palate growth and RA and between the RA receptor and levels of retinoic acid. The RDF may also have assay information showing that an environmental chemical binds and blocks the RA receptor with affinity. A SPARQL query should be able to reveal that this chemical activity disrupts palate growth.

The RDF format facilitates the merging and integrating of data and concepts. The RDF database, for instance, could integrate chemical structural information from a chemistry source. By employing a chemical structure ontology, a query could be constructed that reveals that many chemicals with this feature are linked to the same developmental perturbation.

It is important to note that the developmental ontology and data are separate, even though they are both stored in the RDF. The developmental observations are organised by the developmental ontology and the assay data, for instance, will be organised by an assay ontology. Reasoning with the ontologies on the data is a function of SPARQL query language. The potential contribution of AOPs to the building of developmental ontologies and the identification of appropriate high-throughput assays and in silico models for prediction of developmental toxicants is shown in Figure 2.

Figure 2: Interrelationships between the building of AOPs, developmental ontologies and potential screening assays