Workshop Report 28

Group 2D

The syndicate discussed the questions in the order they were presented and spent 50% of the allotted time on the first question.

Review current tools and key (statistical) methodology, including assumptions about distributions of sensitivity, use of hierarchical models, interspecies correlations. Identify where there are important differences and what the implications of these could be.

Assumptions about frequency distributions:

In practice a number of different distributions are fitted to SSDs.
In practice, even for large numbers of data values, it is difficult to distinguish (e.g. via a statistical test) 2 similar distributions (such as a log normal and a log logistic) but they may give different estimates of HC5 because of differences in the fit to the data in the tail.
Different subsets of the data will give rise to different estimates of the HC5. This raises the question of whether the objective is to protect a certain group of sensitive species, in which case testing may need to be targeted, or whether the objective is more general.
The distribution of importance for risk management is the distribution of sensitivities for the community you’re trying to protect (not tested species).
Tools: Aldenberg and Jaworska (2000) and Aldenberg and Slob (1993): These and other methods are statistically rigorous but not necessarily ecologically rigorous. Some of the methods use a Bayesian approach. They involve fitting a log-normal or log-logistic distribution or similar to measures of toxicity. The species measured are assumed to be a random selection of species in the community of interest (or exchangeable using Bayesian terminology). Under these assumptions, estimates of HC5 and confidence intervals are statistically sound. These methods were the first to be proposed and have been extensively used. Software, such as ETX (van Vlaardingen et al, 2004) and SSD Master (CCME, 2013) are readily available and are easy to set up from an Excel spreadsheet.
WebICE^[1] (Raimondo et al, 2013): This method makes use of the historic database of toxicity values in ICE¹. First a community of relevant species is identified (e.g. aquatic or wildlife species), and then toxicity levels for absent species (predicted values) are estimated using measured toxicities (surrogate observations) and interspecies correlations (or regressions). A complex set of filters can be used to exclude predicted data both prior to and during the fitting process. Each surrogate results in different values to the same predicted species (where models are available) but Web-ICE includes only one value for each species in the SSD, so values predicted by multiple surrogates are evaluated to ensure the most robust prediction is included. This process results in a set of toxicities, some of which are measured and some of which are predicted. Finally an HC5 is computed from the mixed set of toxicities using a log-logistic distribution. The sample of species is again assumed to be a random selection (exchangeable) from a community or population. Confidence intervals for HC5 are not computed according to sound statistical principles, so the application can give odd results for intervals. The method was developed by US EPA who have built an easy to use online tool that is backed up by an extensive historic database. The method has been described in several published peer-reviewed papers and a user manual^[2]. US colleagues have considerable experience of using WebICE but it has not been widely used in Europe. The historic database is regularly updated, which can change the model set used to predict to species. HC5 estimates obtained today may be somewhat different to those obtained in future if an updated suite of models yields additional species for the SSD.
The hSSD concept: This method is based on a Bayesian hierarchical model. It is statistically rigorous and does not assume that measured species are a random sample from a community or population. It is currently in the prototype stage, very few people have experience using it, and it needs to be evaluated more widely and more thoroughly. Effective evaluation requires knowledge of communities of species actually found in the field so it cannot be evaluated from a purely statistical point of view. It makes use of an historic database of toxicity values provided by RIVM and, whilst there might be overlaps, the data set is not the same as that used in WebICE.

[1] http://www.epa.gov/ceampubl/fchain/webice/

[1] http://www.epa.gov/ceampubl/fchain/webice/iceManual.html

2. As sensitivity to chemical stress seems to be related to taxonomic closeness, how could this be used in the construction and interpretation of SSDs?

It is important to first consider context and scoping.
We can have more diagnostic settings for screening resources for developing SSDs.
Taxonomic closeness can say something about communities, but not ecosystems.
Goal: need a protective HC5 with as little testing as possible.

The existence of taxonomic patterns (consistent sensitivity relationships) means that we must be cautious when extrapolating over large taxonomic distances, and also means that we will get better estimates of what we are interested in if we take patterns of sensitivity into consideration. Taxonomic structure of the community we are trying to protect needs to be taken into account in risk assessment and consideration of related differences in sensitivity can be useful in setting guidelines for protection of structure and function. Where sufficient data are available separate SSDs should be constructed for taxonomic- or sensitivity-groups to allow more ecological information to the incorporated into the assessment process.

3. Do models based on prior knowledge provide advantages over other methods?

Yes. The more that is known, the better the prediction will be. Methods that use prior knowledge will be better than methods that do not.

Prior knowledge can include what taxonomic groups might be more sensitive to a chemical class (e.g. molluscs, metals). Having this knowledge prior to developing SSDs can guide assessors to ensure that representatives of the sensitive taxonomic group are included in the SSD.

4. Are current modelling success criteria, such as those identified in the REACH TGD, sufficient, overly prescriptive or insufficient?

The guidelines and criteria are fine, but it is important to define context. It is also important to distinguish between populations, communities, and ecosystems.
Question: Can SSDs be used when there are fewer than 10 tested species?
Question: Is it better to prescribe a criterion that has acceptable confidence intervals rather than a prescribed number of data species.
- If we can show that confidence intervals and HC5 estimates obtained from 5 species are not materially different from those obtained from 10 species, can we assume that fewer species would be reliable enough for regulation purposes? Confidence intervals should indicate how well a method performs. This should be caveated with the discussion point above regarding a priori knowledge of sensitive taxa. If the 5 data points do not include the most sensitive taxa, but have robust confidence intervals, is it protective even if statistically sound?
- Can/should existing criteria be replaced by confidence interval criteria? Should the criteria require either a confidence interval of a given size or use a specific list of taxa.
- Can uncertainty factors be applied depending on amount of data used?
It is better to have more information than less. But if datasets have a large number of common taxa (e.g. fish), then it might be difficult to characterise impacts to less represented species such as amphibians.
We need to be sure to capture taxonomic diversity.

5. What are the research needs?

There is a need to:

determine whether traits are meaningful in development of SSDS.
evaluate SSDs against high quality mesocosm studies.
develop criteria for an acceptable confidence interval for SSDs and HC5s.
develop a model that takes account of the number and type of species in a community and that shows you the consequences/reliability of what you get. Validity criteria – what do we practically need?
be able to extrapolate better to all ecosystems. There is no strong science based evidence that an SSD based on example criteria is protective for ecosystems, however, this argument also applies to the simplistic use of the toxicity value for the most sensitive species tested.
Agree how confident we want to be? Back calculate how confident assessments are given current criteria.