Technical Report 130

Step II: Evaluation of quality, reliability, reproducibility and consistency of the individual studies

Consistent with the title of the second step presented in Section V of ECHA and EFSA (2016), Step II of the ECETOC 7SI-ED describes how the quality (reliability), reproducibility and consistency of individual studies may be assessed. The main purpose of this step of the ECETOC 7SI-ED is to select those studies that are of sufficient quality (reliability) to be used within the WoE evaluation, and to exclude those studies that are of insufficient quality. The criteria that are evaluated in determining study quality also allow a determination of the reproducibility of studies. Further, Step II provides information that may support the determination of within-study consistency. Notwithstanding, within-study consistency and consistency amongst different studies are factors that are applied in the WoE evaluations conducted in Steps III and IV.

The JRC ToxR Tool (available at: https://eurl-ecvam.jrc.ec.europa.eu/about-ecvam/archive-publications/toxrtool

accessed March 2017) is used to assess the reliability of the data presented in each study. This Excel-based tool provides comprehensive criteria and guidance for evaluations of the inherent quality of toxicological data, making the decision process of assigning reliability categories transparent. It comprises a list of 21 criteria for in vivo studies and 18 criteria for in vitro studies. Each criterion can be assigned either a ‘1’ (one point; i.e. ‘criterion met’) or a ‘0’ (no point; i.e. ‘criterion not met’). The JRC ToxR Tool specifies indispensable criteria (e.g. substance identification, specification of test species and of route of administration). Only if all indispensable criteria are rated as ‘1’, the tool will assign a study to the Reliability-Categories 1 or 2, irrespective of the total score obtained.

The ToxR Tool spreadsheet includes explanations and guidance for most of the criteria. Further, the ToxR Tool contains free text fields to justify individual scores. Comprehensive use of these free text fields can aid in ensuring transparency of the assigned Reliability-Categories.

Criteria for evaluating in vivo or in vitro study reliability were established congruently to the largest extent possible and are grouped into five groups of criteria for either in vivo or in vitro studies:

  1. Test substance identification;
  2. Test system characterisation;
  3. Study design description;
  4. Study results documentation;
  5. Plausibility of study design and results.

Groups 1-4 contain criteria that are mainly related to documentation, and they are relevant to determine the reliability of studies. Further, test system characterisation and study design description are relevant to assess the reproducibility of studies. Group 5 goes beyond documentation and asks for an assessment of the internal plausibility of the experimental approach used in the study. Such information is relevant to determine the power of a study to inform causality between substance exposure and outcomes, or toxicological significance. By comparison, study consistency is a factor applied during the Step III and IV WoE evaluation.

Whilst the JRC ToxR Tool criteria and guiding explanations for in vivo studies were primarily designed for the assessment of toxicological studies, the general principles of the criteria are also applicable to the evaluation of ecotoxicological studies (Roberts and Leopold, 2016). Specific guidance for the evaluation of ecotoxicological tests is being developed and should be consulted where additional justification for the individual criteria is required (e.g., Moermond et al., 2016a). Alternatively, for an assessment of reliability and relevance of ecotoxicology studies, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) evaluation method might be used (Moermond et al., 2016b).

Each study is assigned into one of the categories of the JRC ToxR Tool as described above, taking into account specific guidance for the evaluation of ecotoxicological tests as appropriate. Thereafter, all assays and studies that shall be used in the WoE evaluation are placed into the appropriate Level of the OECD CF:

  • Level 2: In vitro assays providing data about selected endocrine mechanism(s) / pathway(s);
  • Level 3: In vivo assays providing data about selected endocrine mechanism(s) / pathway(s);
  • Level 4: In vivo studies providing data on adverse effects on endocrine-relevant endpoints;
  • Level 5: In vivo studies providing more comprehensive data on adverse effects on endocrine-relevant endpoints over more extensive parts of the life cycle of the organism.

By analogy to the OECD CF Levels 2 and 3, in vitro and in vivo assays providing data about non-endocrine mechanism(s) / pathway(s) are also collected and sorted for subsequent use in Step IV.

Data from Level 2 and 3 assays and diagnostic endpoints included in Level 4 and 5 studies (e.g. hormone levels, vitellogenin, etc.), provide mechanistic information that is used for the evaluation of endocrine activity in Step IV. Further Level 4 and 5 study parameters (e.g. organ weight and histopathological parameters) may also provide information that is relevant to determine the consistency and coherence of endocrine activity.

Apical endpoints from the in vivo studies in Levels 4 and 5 provide information on adverse effects that are evaluated in Step III. Apical effects may also be identified in specific Level 3 in vivo assays (e.g. fecundity from the fish reproductive screening assay; OECD TG 229). However, screening assays are designed to inform whether further testing is needed, and therefore only include, e.g., limited portions of life cycles and fewer concentrations than the extensive Level 4 and 5 studies. For this reason, decisions on adverse effects should not be made based on apical effects from Level 3 in vivo assays alone (Wheeler et al., 2014). They should only be used in a WoE approach to support apical data derived from Level 4 and 5 studies.

While the ECHA and EFSA (2016) outline has excluded studies with invertebrates from its scope and OECD GD 150 does not provide specific guidance on invertebrate tests, the OECD CF (OECD, 2012a) does include Level 4 and Level 5 invertebrate partial and life cycle tests, which measure apical endpoints that are population-relevant. However, the majority of invertebrate test designs do not provide mechanistic insight, and no specific TGs exist yet to characterise endocrine activity in invertebrates at Levels 2 or 3 of the OECD CF (Coady et al., 2017). This has been identified as a research need and new test methods are in development (OECD, 2016).