Moderator: T. Gouin
Rapporteur: J. Armitage
Uncertainty in key physicochemical property data
The main objective of this work group (WG3) was to address issues of uncertainty and applicability domain with respect to physicochemical properties of organic chemicals, including miscible and ionisable organic chemicals, in applying the chemical activity concept to environmental risk assessment. The following sections are formatted around a set of questions (listed below) addressing critical challenges and potential limitations in using chemical activity in environmental and biological systems:
1. Uncertainty in key physicochemical property data
How reliable are available water solubility data? How reliable are current approaches for estimating water solubility in the absence of empirical data?
How reliable are available melting point data? How reliable are current approaches for estimating melting points in the absence of empirical data?
How reliable is Walden’s Rule given the wide range of chemical structures for which the chemical activity concept may be applied to?
How reliable are available methods to estimate the entropy of melting (ΔSM) from chemical structure?
Taken together, what is the expected uncertainty in chemical activity calculations for ‘data poor’ chemicals?
2. Calculation of chemical activity in non-aqueous phases (biota)
Is the octanol-water (KOW) paradigm sufficiently accurate for estimating KBW (i.e. biota-water partition coefficient)? When is it necessary to consider more sophisticated approaches (e.g., ppLFERs) for estimating KBW?
3. Application of the chemical activity concept to miscible organic chemicals (MOCs)
To what extent are empirically-based chemical activity coefficients available for miscible chemicals and how reliable are these data?
How reliable are computational approaches (e.g., UNIFAC, COSMOTherm, SPARC) for estimating chemical activity coefficients for miscible chemicals?
Case Study: Are the chemical activities corresponding to LC50s for ‘narcotic miscibles’ calculated using Equation 7 consistent with expectations (i.e., Ea50s ~ 0.01)?
Given the (relatively) low affinity for lipids and other non-lipid organic matter, what modifications to the approach for estimating KBW (see above) are necessary?
4. Application of the chemical activity concept to ionisable organic chemicals (IOCs)
To what extent can approaches to calculate chemical activity for neutral organic chemicals be expanded/modified to IOCs? Are methods for estimating the activity coefficients of electrolytes (e.g., Debye–Hückel approach) ( Trapp et al., 2010) compatible with methods for neutral organic chemicals?
Case Study: Can the intrinsic water solubility (i.e., water solubility of the neutral form) and fraction of chemical in neutral form in solution be used to calculate chemical activity from LC50s?
In addressing each of the questions listed above, the major topics addressed in this section thus include a review of the uncertainties in critical physicochemical properties, the data needed to calculate chemical activity, the calculation of chemical activity in water, the calculation of chemical activity in biota, and the application of the chemical activity to miscible organic chemicals (MOCs) and ionisable organic chemicals (IOCs), and which summarises the nature of the discussions that occurred during the workshop within this workgroup. It should be noted that the topics covered in this section reflect the expertise within the workgroup, which may be stronger in some areas than others. Nonetheless, the participants within the group have made every effort to best articulate key data gaps, and where possible make recommendations regarding how data gaps might be best addressed.
1. Uncertainty in key physicochemical property data for neutral organic chemicals
It can be argued that one of the key physicochemical properties influencing the overall behaviour of hydrophobic organic chemicals in the environment and biological systems is the chemical activity coefficient (γ) in aqueous solution. This is because water in the environment and biological systems provides an important phase through which chemicals are transported. A measure of the hydrophobicity of a chemical can be attained by quantification of γ, which can provide an understanding of how a chemical partitions between water and other environmental phases (Sandler, 1996).
The γ of an organic chemical describes the relative degree of deviation from ideality, as described by Raoult’s law, in which under ideal conditions the activity of a chemical (a) is equal to the mole fraction (χ). In environmental and biological systems, where organic chemicals are dissolved in water, a is proportional to χ, and γ is the proportionality constant that describes this relationship. Thermodynamically, γ describes the excess Gibbs free energy (G) associated with non-ideal solutions.
G = RT lnγ (10)
where R and T represent the universal gas law constant and temperature, respectively, thus quantifying the deviation from ideal behaviour.
At equilibrium, the chemical activities of chemicals in the water and organic or lipid-like phases in the environmental and biological systems are equal, or alternatively:
χ wγw = χ oγo (11)
where the subscripts ‘w’ and ‘o’ refer to water and an organic or lipid-like phase in the environment or biological system.
For many organic chemicals, γo does not show considerable variation. For example, γo in octanol for a range of neutral organic chemicals is relatively constant at about 2.5 (Mackay et al., 2014; Sandler, 1996). Consequently, the partitioning process between water and various organic phases will be largely influenced by γw, which varies several orders of magnitude between organic chemicals (Schwarzenbach et al., 2003). The octanol-water partition coefficient (KOW), for instance, is the ratio of solubility in octanol and water, and is typically used as a metric of hydrophobicity. It can be shown, however, that the activity coefficient in water is the key parameter that influences the magnitude of KOW (Andren et al., 1987; Chiou, 1981; Llinàs et al., 2008; Sandler, 1996; Schwarzenbach et al., 2003), as well as solubility in water (Sw), whereby:
Sw = 1 / γwvw (12)
where vw is the molar volume of water, both Kow and Sw are important input parameters for a wide range of models aimed at assessing fate and transport of organic chemicals.
Whereas KOW and Sw strongly influence the behaviour of chemicals in environmental and biological systems, there are considerable challenges associated with quantifying each of these properties. A key challenge is related to limited availability of high-quality empirical data sets of reliable and reproducible measurements for many organic chemicals (Llinàs et al., 2008). For instance, in their review of solubility and KOW data for the relatively well-studied organochlorine pesticide DDT, Pontollio and Eganhouse (2001) observed that the data reported in the literature tend to be populated by multi-level references, citation errors, and data errors, with reported property values spanning several orders of magnitude. Consequently, given the large degree of variance in the reported data, combined with the lack of information to fully evaluate the quality of the original data, the ability to define a true solubility value for DDT, for instance, represents a substantial challenge (Pontolillo and Eganhouse, 2001). Given the challenges associated with a well-studied chemical such as DDT, for less-well studied chemicals where there may only be a single empirically derived Sw value, it is potentially not practically possible to assign an estimate of uncertainty against that single available value.
The current situation is thus problematic, particularly given the relative importance of KOW and SW in estimating chemical behaviour, whereby the use of erroneous data as input to environmental fate and physiologically based pharmacokinetic (PBPK) models can result in high uncertainty in assessing chemical risk and efficacy (Mackay et al., 2009; Pontolillo & Eganhouse, 2001; Tesconi & Landis, 2013). While various efforts have been initiated to improve the reliability of empirical measurements towards the development of more robust in silico tools (Hewitt et al., 2009; Llinàs et al., 2008), for the vast number of chemicals used in commerce, establishing reproducibility of physicochemical property data between laboratories and analytical methods is rarely assessed. Thus, current practice continues to rely heavily on KOW and SW measurements obtained from a single laboratory study and/or output obtained from a single estimation method. Where the use of a single value is the only option, the ability to quantify the uncertainty represents a substantial challenge. Empirical solubility data are typically measured for one chemical at a time. Toxicity tests can use co-solvents which may enhance the solubility of individual chemicals, particularly for higher molecular weight/low solubility chemicals. Limitations of this co-solvent effect may be relevant in both a laboratory settings (e.g., when measuring solubility of large, low-soluble chemicals), and also in environmental monitoring scenarios (e.g., organic materials or other chemicals in environmental samples acting as co-solvents for the target chemical). Key objectives for participants within this workgroup were thus to consider the influence of uncertainty in physicochemical properties in relation to estimating a thermodynamic chemical activity which might be used within environmental risk assessments, and to propose approaches that might be adapted for applying the chemical activity concept to nonpolar, miscible, and ionisable organic compounds.
1) How reliable are available water solubility data? How reliable are current approaches for estimating water solubility in the absence of empirical data?
In an attempt to provide preliminary insight regarding the variance that might exist in water solubility measurements, data in relation to 233 neutral organic chemicals reported by Mackay et al. (2006) were assessed with respect to their availability of solubility data. An illustration of the results is shown in Figure 3.4.1, which summarises 2440 solubility measurements for the 233 chemicals included in the assessment. The dataset reported by Mackay et al. (2006) are believed to provide a relatively good indication of the variance that might exist in empirically derived solubility data for neutral organics, with the majority of chemicals having more than ten separate solubility measurements. A general observation from Figure 3.4.1, is that as solubility decreases the relative magnitude of the uncertainty increases, thus implying caution when relying on a limited number of solubility measurements for relatively insoluble organic chemicals (i.e. <0.01 mg/L).
In addition to the challenges of assessing the variance and uncertainty associated with measured physicochemical properties are the challenges in assessing the applicability domain and uncertainties in property data obtained from estimation methods. In the absence of empirical water solubility data, various estimation methods, such as the WATERNT v1.01 and WSKOWWIN models within U.S. EPA’s EPISUITE, are heavily relied upon, particularly in estimating exposure concentrations. The U.S. EPA’s EPISUITE empirical database underlying the WATERNT v1.01 submodule contains water solubility data for 5764 chemicals (1128 in training set, 4636 in validation set), and represents one of the most widely used estimation methods. The reported water solubilities in the training set range from 4·10-7 to 1·106 mg/L (9·10-13 to 22 mol/L) while the reported water solubilities in the validation set range from 4·10-8 to 6·106 mg/L (7·10-14 to 50 mol/L). Chemicals with reported water solubilities equal to 1·106 mg/L are likely to be miscible organic chemicals (MOCs) (e.g., some organic solvents) and values greater than this should be considered suspect. A subset of these data were used to train and validate the EPISUITE WSKOWWIN submodule (1450 in training set, 902 in validation set).
The average deviations of the WATERNT v1.01 predictions for the training and validation set are 0.355 and 0.796 log units, respectively, which corresponds to factors of approximately 2.5 and 6.0, respectively. Similar performance was found for the WSKOWWIN v1.42 submodule. However, substantially larger errors can be obtained for some chemicals (i.e., greater than two orders of magnitude).
It is notable that while the various estimation packages within EPISUITE tend to be widely used, largely due to being easily and freely accessible, there do exist a myriad of methods for estimating water solubility. For example, water solubility can be estimated based on correlation with a variety of descriptors. Dearden (2006) summarise a large number of quantitative structure property relationships that have been derived since 1990, and categorised the descriptors used in estimating water solubility as the following: log KOW with or without melting point; atom/group contributions; physicochemical and quantum chemical descriptors; and topological indices. Statistical techniques prior to 1990 were based on linear regression, but artificial neural networks began to be used after 1990, with partial least squares statistics and descriptor selection by genetic algorithm also being used.
Dearden (2006) also notes that the development of estimation methods currently relies on the use of diverse compound libraries, which is particularly important in the development of new active pharmaceutical ingredients (APIs), where good understanding of aqueous solubility is a critical component in estimating oral absorption. While the performance of the various estimation methods reviewed by Dearden (2006) is highly variable, a general observation is that the degree of uncertainty with using an estimation method depends largely on whether or not the test chemicals being assessed have structural similarities to the chemicals used in the training set.
It is notable that the relationship between KOW and SW has been widely used in the development of methods for estimating SW. For instance, Hansch et al (1968) demonstrated that for a heterogeneous data set of organic liquids that:
Log SW = -1.339logKOW + 0.978 (n=156, r2 = 0.874, s = 0.472) (13)
For organic chemicals that are solids at ambient temperature, it is necessary to consider the energies associated with the dissolution process. When a solid dissolves in water, the first step is to envision a melting step to a sub-cooled liquid, where the enthalpic and entropic changes cause the solubility of the solid to be less than that of its sub-cooled liquid. The difference between the two is proportional to the melting point Tm of the solid. In an effort to better estimate the solubility of solids, Yalkowsky and Valvani (1980) proposed the following:
Log SW = 0.8 – logKOW – 0.01(Tm – 25) (14)
Additional challenges associated with organic chemicals that are solids at ambient temperatures is further explored below, but here we attempt to capture how various estimation methods have evolved from these early observations. For instance, efforts to improve the performance of estimating SW for nonpolar organic chemicals include the use of an additional descriptor of molecular size to account for the influence of energy required for cavity formation between water molecules. For polar solutes, in addition to molecular size descriptors to account for hydrogen bonding, atomic charge, polarisability, and polar surface area have all been utilised. Nonetheless, a general trend can be observed, whereby estimation methods tend to perform better for soluble chemicals than for insoluble chemicals. This may be related to the quality of the measured data for poorly soluble chemicals, as discussed above.
Another key challenge in estimating the SW for organic chemicals relates to the availability and performance of methods for ionisable organic chemicals (IOCs). Active pharmaceutical ingredients (APIs) for instance consist of a large number of IOCs, encouraging Hewitt et al. (2009) to address a solubility challenge directly aimed at estimating the solubility of 32 API using a high quality training set of 97 chemicals. Based on a rigorous process assessing the quality of measured data and ensuring an appropriate applicability domain for a large number of estimation methods, Hewitt et al. (2009) report on the performance for the following models: ChemSilico (CSLogWS), Optibrium (StarDrop), Pharma Algorithms, SPARC, and seven different modules used within Simulation Plus (YINAN; UIQBB; LGGAV; A69EM; NSLIC; AM108; OLASM)). Included in their evaluation is also the results of a new in silico consensus tool, which gave relatively good performance (R2 = 0.60; s = 0.68; RMSE = 0.90) (Hewitt et al., 2009). Consequently, methods that combine the output of a large number of estimation methods that have an applicability domain relevant to the chemicals being assessed appear to lead to lower uncertainty regarding the SW of a chemical under investigation. Indeed, more recently Cappelli et al (2013) also assessed the performance of five different estimation methods (ACD, T.E.S.T. 4.0.1, ADMET Predictor 6.0, and the two EPI Suite 4.1 solubility estimation modules) using 400 chemicals with experimental values, and found that the consensus method reported by the T.E.S.T. 4.0.1 estimation method performed better than other methods (R2 = 0.658).
A key message from this section is that the uncertainty/error in measured or estimated water solubilities translates directly into uncertainty/error in the calculated chemical activity. In the absence of empirical data for chemicals of interest, it is strongly recommended that multiple estimation software and/or other techniques (e.g., polyparameter Linear Free Energy relationships; ppLFERs) be utilised to assess the uncertainty in the solubility estimates. Note that while a high level of agreement between various estimates increases the level of confidence, and confers precision in the predicted values, it does not necessarily guarantee accuracy.
2) How reliable are available melting point data? How reliable are current approaches for estimating melting points in the absence of empirical data?
For chemicals that are a solid at the ambient temperature in the system of interest, it is important to use the chemical’s sub-cooled liquid properties, and not properties of the solid-state chemical, when calculating activity. If sub-cooled liquid property data are not available for a given chemical, then melting point data can be used to convert solid-state properties to sub-cooled liquid properties (e.g., sub-cooled liquid solubility) via the Fugacity Ratio (F). Assuming Walden’s Rule (1908) applies (see following section), the Fugacity Ratio can be calculated using the following expression:
where TM (oC or K) is a given chemical’s melting point and T is the ambient temperature in the system of interest.
The EPISUITE empirical melting point test database (MPBPVP v1.43 submodule) contains reported values for 10051 chemicals, with the reported melting points ranging from –205 to +492 oC. The largest publicly available database was recently compiled and described by Tetko et al. (2014). The database is divided into four non-overlapping datasets: i) OCHEM (n = 21883), ii) Enamine (n = 22404), iii) Bradley (n = 2866), and Bergström (n = 277). Note that the OCHEM dataset includes the EPISUITE database mentioned above. In general, the smaller the dataset, the more highly curated (and hence reliable) are the data.
The average prediction error of the MPBPVP v1.43 submodule for the test set (n = 10051) is approximately ± 50 oC and the root mean square error (RMSE) is approximately 64 oC. As with water solubility estimates, larger errors in estimated melting point (e.g., > 200 oC) can occur for some chemicals. Physical state misclassification (i.e., solids predicted to be liquids at 25 oC and vice versa) can also occur but is relatively uncommon; the MPBPVP v1.43 submodule correctly predicts the physical state for 86% of the data points used to evaluate the model.
The performance of the melting point estimation models developed and described in Tetko et al. (2014) is modestly improved in comparison to the MPBPVP v1.43 submodule. As discussed in Tetko et al. (2014), the RMSEs of estimated melting points are smallest for chemicals with reported melting points between 100 and 200 oC (RMSE ~ 30–40 oC) and largest for chemicals with melting points greater than 250 oC (RMSE ~ 50–90 oC). Because there is generally a positive relationship between melting point and molecular weight, these results suggest that uncertainty/error in predicted melting points will be greater for larger molecules.
It is worth reiterating that uncertainty/error in melting points for liquids are not relevant for chemical activity calculations and that the main concerns are uncertainty/error in predictions for solids and misclassification errors. For example, the average deviation of 50 oC and RMSE of 64 oC corresponds to a potential error in the Fugacity Ratio (F) of a factor of about 3.0 and 4.5 respectively. This factor translates directly into the uncertainty/error of the sub-cooled liquid water solubility estimates. Much larger errors (i.e., two orders of magnitude) are also possible for some chemicals. However, as documented above, the amount of publicly-available melting point data is relatively large (n = 47430 datapoints). While it may seem disappointing that all prediction methods still exhibit relatively large error, it is likely that empirical melting point data will be available for many chemicals of interest.
3) How reliable is Walden’s Rule given the wide range of chemical structures for which the chemical activity concept may be applied?
Implicit to the simplified equation for calculating the Fugacity Ratio (F) is the applicability of Walden’s Rule (1908), which states that the entropy of melting (ΔSM) is 56.5 J/K·mol. This entropy value is based on coal tar derivatives and is most applicable to rigid aromatics (e.g., polycyclic aromatic hydrocarbons). For small spherical compounds (e.g., methane, neon), ΔSM is on the order of 10 J/K·mol (Richard’s Rule) (Jain et al., 2004b). The more rigorous expression for calculating F is shown below.
The largest empirical database of ΔSM values we are aware of was compiled by Jain et al. (2004b). This database contains 1799 reported ΔSM values, which range from 0.6 (2,2-paracyclophane) to 232.6 J/K·mol (2-heneicosanone). The average reported ΔSM values and standard deviation (σ) are 67 and 32 J/K·mol, respectively; the median ΔSM value is 60 J/K·mol. The empirical ΔSM database compiled by Dannenfelser and Yalkowsky (1996) (n = 1311) exhibits a larger range (ΔSM = 0.7–588 J/K·mol) but similar central tendencies (average ΔSM = 70 J/K·mol, σ = 57 J/K·mol, median ΔSM = 54 J/K·mol). The largest discrepancies between the reported ΔSM and Walden’s Rule are for long chain alkyl substances (e.g., tristearin, C57H110O6).
The uncertainty/error in the Fugacity Ratio (F) associated with uncertainty/error in ΔSM depends on the melting point of the chemical of interest. The dependence of the error on melting point is illustrated in Figure 4.3.2 as a function of assumed ΔSM for hypothetical chemicals/property value combinations.
Figure 4.3.2. Absolute error in estimated Fugacity Ratio (log units) as a function of ΔSM for hypothetical chemicals with melting points of 30, 75, 150, 300 and 500 oC.
As illustrated in Figure 4.3.2, the uncertainty/error in F (and hence the subcooled liquid state property value) when Walden’s Rule does not apply can be much greater for chemicals with larger melting points. For example, the uncertainty/error in F is greater than 6 orders of magnitude for a chemical with ΔSM = 125 J/K·mol and TM = 500 °C, but less than 1 order of magnitude for any chemical with TM = 30 °C. While the assessment based on hypothetical chemicals/property value combinations exaggerates the potential for error, in general, it is clear that errors in calculated F values greater than 3 to 5 orders of magnitude could occur, but only for chemicals with high melting points. The central tendency of the ΔSMvalues in the databases cited above suggest that large deviations/errors may be uncommon.
4) How reliable are available methods to estimate the entropy of melting (ΔSM) from chemical structure?
Dennenfalser and Yalkowsky (1996) introduced a semi-empirical approach for estimating the entropy of fusion (ΔSM) using structural features. The equation they proposed is shown below:
where C is an entropy of melting constant, λ is a molecular rotational symmetry number, and ϕ is a molecular flexibility number. The rotational symmetry number is defined as, “the number of positions into which a molecule can be rotated that are identical to a reference position” whereas the flexibility number is a function of chain length and a “flexibility count” or number of torsional angles.
When applied to a subset of chemicals in the Dennenfalser and Yalkowsky ΔSM database (n = 933), the average absolute error was 12.5 J/K·mol. However, error greater than or equal to 20 J/K·mol occurred for approximately 20% of the chemicals in the database. Similar performance was reported when the same equation was applied to the larger Jain et al. (2004b) database (n = 1799). While these results are promising, we are not aware of any automated methods to generate the required input parameters for the ΔSM equation above.
Brown et al. (2015) explored the possibility of applying an Iterative Fragment Selection (IFS) approach (Brown et al., 2012) to estimate ΔSM from chemical structure (SMILES code). The IFS-QSAR was trained using 1056 chemicals from the Jain et al. (2004b) database and validated using 529 chemicals. Despite neglecting molecular symmetry effects, the performance of the IFS-QSAR was comparable to the original method. As the IFS-QSAR approach can readily be automated, it is hoped that the capability to generate estimates of ΔSMcan be soon disseminated to expert and non-expert users.
Brown et al. (2015) also generated an IFS-based QSAR for melting point. The two IFS-QSARs were used to generate estimates of TM and ΔSM for the 199 solid chemicals in the ΔSM validation set with empirical melting point data. Fugacity Ratios calculated using the empirical TM and ΔSM were then compared to Fugacity Ratios calculated using the IFS-QSAR estimates. In this case study, 77% of the IFS-based Fugacity Ratios were within a factor of three of the empirically-based values; the maximum error was a factor of 34.
5) Taken together, what is the expected uncertainty in chemical activity calculations for ‘data poor’ chemicals?
The relative influence of uncertainty in the physicochemical properties of an organic chemical on a chemical activity calculation will be directly related to the quality of input data. It is thus anticipated that calculations based on a limited assessment of SW, either based on measured, estimated or a combination of both measured and estimated data, will inherently propagate a larger magnitude of uncertainty associated with calculations of chemical activity. Nonetheless, limited or poor quality SW may still prove useful for screening and prioritisation purposes, or where the chemical activity concept is used at low tiers of assessment. If information based on chemical activity calculations are to be used within higher tiers of assessment it is suggested that efforts be targeted towards reducing the relative magnitude of uncertainty in SW by ensuring the use of high quality measured data where consistency in SW can be shown to be relatively consistent between different labs and analytical methods. Figure 4.3.3 qualitatively illustrates our perception of the expected uncertainty in chemical activity calculations in relation to how the calculation might be used in risk assessment. Figure 4.3.3 implies that a higher level of uncertainty can be accepted at lower tiers of assessment, where it is anticipated that input data would be subject to relatively little scrutiny. At higher tiers of assessment it is expected that input data would be of higher quality and receive greater scrutiny, which we believe would help to reduce the relative magnitude of uncertainty in calculations of chemical activity. Figure 4.3.4 illustrates how uncertainties in the various input properties discussed above might influence on the relative magnitude of uncertainty in a calculation of chemical activity.
Figure 4.3.3: Qualitative illustration relating the relative magnitude of uncertainty to different tiers of risk assessment.
It is anticipated that at lower tiers of assessment input data to calculations of chemical activity will be of lower quality and receive less scrutiny than input data used at higher tiers of assessment. This will directly influence the relative magnitude of uncertainty, whereby higher uncertainty may be acceptable in screening and prioritisation, but that higher quality data are necessary at higher tiers of evaluation to effectively reduce the inherent uncertainty that may propagate through the chemical activity calculation.
Figure 4.3.4: Illustration of how uncertainties in various physicochemical properties discussed in this section can propagate to influence the relative magnitude of uncertainty in a chemical activity calculation. Use of high quality measured data can be seen to reduce the propagation of error that might be associated with data based on estimates and various assumptions.
2. Calculation of chemical activity in non-aqueous phases (biota)
1) Is the octanol-water (KOW) paradigm sufficiently accurate for estimating KBW? When is it necessary to consider more sophisticated approaches (e.g., ppLFERs) for estimating KBW?
The two main reasons for estimating chemical activity in non-aqueous phases are to i) convert biomonitoring data in non-aqueous phases (e.g., lipids) to chemical activities and ii) calculate chemical activities in situations where non-equilibrium conditions (i.e., chemical activity in biota ≠ chemical activity in water) are suspected or known to exist. Non-equilibrium conditions between water and biota may exist because of various processes, including i) rapid biotransformation in vivo or other kinetic limitations on chemical uptake (e.g., for superhydrophobic chemicals) and ii) biomagnification in food webs (i.e., step-wise increase in chemical activity from prey to predator). Note that rapid biotransformation or other kinetic limitations on chemical uptake can also lead to biodilution in food webs (i.e., trophic dilution).
Chemical activity in biota (aB) can be calculated analogously to water:
where CB is the concentration of the chemical in biota and SB is the solubility of the chemical (liquid or sub-cooled liquid) in the organism. Although the concept of “solubility in biota” is not an intuitive one, it is believed that it can be approximated for neutral non-polar organics as the product of the water solubility and a biota-water partition coefficient (KBW) (Mackay et al., 2011) as shown below:
As a first approximation, the biota-water partition coefficient for neutral organic chemicals can be estimated as the product of the total lipid content of the organism (fL) and the octanol-water partition coefficient (KOW), that is:
The main assumptions underlying this expression are that i) lipids represent the dominant storage reservoir in the organisms and ii) octanol is a sufficiently accurate surrogate for lipids.
Broadly speaking, lipids can be divided into two classes, i) storage lipids (e.g., adipose) and ii) membrane lipids (i.e., phospholipids). Polyparameter linear free energy relationships (ppLFERs) are available for both types of lipids (Endo et al., 2011; Geisler et al., 2012; Geisler et al., 2015). In combination with solute descriptors for chemicals of interest (e.g., the UFZ LSER Database) (Endo et al., 2015), it is now possible to estimate partition coefficients for storage lipids and membrane lipids for any neutral organic chemical of interest. ppLFERs for structural proteins and plasma proteins are also available (Endo et al., 2011, 2012), meaning that sorption to some non-lipid organic matter (NLOM) can also be captured. In other words, the biota-water partition coefficient can be expanded to address partitioning in greater detail, as deemed necessary:
where fSL, fML and fNLOM are the fractions of storage lipids, membrane lipids and non-lipid organic matter, respectively, and KSLW, KMLW and KNLOMW are the corresponding partition coefficients, respectively.
Based on current knowledge of the performance of the available ppLFERs, it is recommended that more sophisticated approaches for estimating partitioning to biological macromolecules be considered for polar neutral organic compounds (i.e., compounds capable of engaging in hydrogen bonding), whereas the simplified approach is likely sufficient for apolar neutral organic compounds. Chemicals can be screened as ‘polar’ or ‘apolar’ by examining the values of the solute descriptors for the H-bond donor (α) and H-bond acceptor (β) parameters. The recommended source of these solute descriptors is the UFZ LSER Database (Endo et al., 2015). In the absence of reported data, estimates can be obtained using the ABSOLV estimation software from ACD Labs (www.acdlabs.com/products/percepta /predictors/absolv/).
More sophisticated approaches for estimating biota-water partitioning should also be considered for organisms (or tissues) with very low lipid content (e.g., fL < 0.01), as partitioning to proteins is likely to be more important (deBruyn and Gobas, 2007; Endo et al., 2012). Finally, neutral chemicals with large water solubilities and/or small log KOW values (i.e., log KOW < 2) should also be treated differently (see section on Miscible Organic Chemicals, MOCs).
3. Application of the chemical activity concept to miscible organic chemicals (MOCs)
As introduced previously, chemical activity in water can be calculated from the concentration of the chemical in the aqueous phase (CW) and the water solubility (SW) (liquid or subcooled liquid):
This expression is problematic for miscible organic chemicals because truly miscible chemicals can be mixed into water up to any mole fraction (χ) i.e., from 0 (not present) to 1 (pure chemical). In other words, a constant water solubility does not exist. However, as discussed earlier, chemical activity can also be estimated using the following expression,
Sherman et al. (1996) compiled a database of 336 empirically-based activity coefficients derived from various experimental techniques (e.g., gas chromatography, differential static cell equilibrium). These activity coefficients are intended to be representative of the chemical in water at infinite dilution (i.e., ). However, many of the data points were estimated from inverse solubility using the expression below and therefore are activity coefficients at saturation
where is the molar volume of water (0.018 L/mol) and is the solubility of the chemical in water (liquid or subcooled liquid) at saturation. As shown in Table 4.3.1, activity coefficients at saturation tend to be similar to activity coefficients at infinite dilution for more sparingly soluble chemicals (i.e., limited concentration dependence is exhibited).
Table 4.3.1. Reported activity coefficients at saturation and infinite dilution for a set of neutral organic chemicals (Schwarzenbach et al., 2003).
The assumption of limited concentration dependence is not valid for more water soluble chemicals (i.e., γW < 100),as can be seen in Figure 4.3.5. Moreover, the inverse solubility approach cannot be applied to miscible chemicals.
Figure 4.3.5. Estimated activity coefficient of ethanol and water at 25 oC as a function of the mole fraction of ethanol in the solution (generated using the DDBST-UNIFAC online calculator; http://ddbonline.ddbst.com/UNIFACCalculation/UNIFACCalculationCGI.exe).
1) To what extent are empirically-based chemical activity coefficients available for miscible organic chemicals and how reliable are these data?
For the purposes of this assessment, miscible chemicals are assumed to be those listed in the Sherman et al. (1996) database that exhibit values less than 20 (Mackay, 2001). Based on this criterion, approximately 15% (n = 49) of the empirically-based activity coefficients are for MOCs. The reliability of these data is unclear as very few chemicals have activity coefficients estimated using different techniques. Literature values of activity coefficients for methanol, ethanol, and 1-propanol compiled by Sherman et al. (1996) are within a factor of two whereas the literature values for acetone are within a factor of 10.
As we are unaware of any other publicly-available databases, no further evaluation is possible.
2) How reliable are computational approaches (e.g., UNIFAC, COSMOTherm, SPARC) for estimating chemical activity coefficients for miscible organic chemicals?
Activity coefficients can be estimated using ppLFERs (Sherman et al., 1996, Schwarzenbach et al., 2003) and various software estimation programs (e.g., UNIFAC, SPARC, COSMOtherm). An example of a ppLFER is shown below (Schwarzenbach et al., 2003):
where is the vapour pressure of the chemical (liquid or subcooled liquid) (Pa), VX is the molar volume of the chemical (cm3/mol), is the refractive index of the chemical, and π, α and β are solute descriptors for dipolarity, H-bond donor and H-bond acceptor properties.
Activity coefficients estimated using the ppLFER described above (n = 266) were found to be within a factor of two to three of empirical values and it is suggested that the general performance of this method can be expected to be similar (Schwarzenbach et al., 2003).
Sherman et al. (1996) evaluated the performance of two ppLFERs using a subset of the values in their database. Note that one of the ppLFERs evaluated in Sherman et al. (1996) was trained using the remaining data points. Estimated activity coefficients were within a factor of two or less on average of the empirically-based values (average absolute deviation = 0.3–0.5 ln units). Furthermore, there was no large distinction in ppLFER performance for empirically-based values from inverse solubility data versus other techniques.
Sherman et al. (1996) also evaluated the performance of UNIFAC using the same subset of data from their compilation. The average absolute deviation was 0.6 ln units (i.e., estimated values were again within a factor of two on average).
A comparison between reported activity coefficients (n = 326) and those estimated by SPARC are shown in Figure 4.3.6. On average, the estimated values are within a factor of 2.5 of the reported values. However, some large discrepancies can be seen for chemicals with relatively large water solubilities (i.e., small activity coefficients). Accordingly, estimates for other miscible organic chemicals may be biased more than indicated by the average model performance.
Figure 4.3.6. Comparison of reported activity coefficients at infinite dilution and activity coefficients calculated by the SPARC estimation software.
COSMO-RS theory, as developed and distributed in the COSMOtherm program suite by COSMOlogic GmbH in Germany, is a relatively new and powerful theoretical method for estimation of infinite-dilution activity coefficients in aqueous and non-aqueous solvation environments. Invented by Andreas Klamt in the early 1990s, COSMO-RS is based on a “first-principles” approach, which uses the results of quantum-mechanical density-functional theory (DFT) calculations to estimate the relative Gibbs energy of a molecule in a solvation environment. The technique is “universal” in that it relies on a relatively small set of global parameters, which are optimised in the initial development by simultaneously fitting a wide range of chemical properties over a large and varied chemical set. Once optimised, these are not changed, irrespective of the solute molecule in question or the physico-chemical property sought. This gives a distinct advantage over UNIFAC, which is parameterised for all relevant types of functional group-group interactions.
In a recent evaluation by COSMOlogic in which they tested COSMOtherm’s (v 15) ability to estimate activity coefficients compared with published data of two UNIFAC variants with aqueous and organic solvent sets, UNIFAC outperformed COSMOtherm for simple (single functional group) organic molecules (0.39 and 0.17 vs 0.53 RMS error in ln(γ) (Gerber and Soares, 2010). However, for the aqueous chemical set, for which UNIFAC is relatively poorly parameterised, COSMO-therm significantly outperformed UNIFAC, with an RMS error in ln(γ) of 0.79 vs 1.84 and 2.31. Overall, with both aqueous and non-aqueous solvation environments, and with a noted bias toward the more favourable organic solvent data for UNIFAC, COSMOtherm yielded an RMS error of 0.67 in ln(γ) compared with UNIFAC, which gave nearly double this value at 1.26 and 1.11. We can therefore anticipate an RMS error of about 0.8 in ln(γ) for aqueous infinite dilution activity coefficients with COSMO-RS. Note that the non-water set contains only 50 different chemical substances of which 16 are alkanes. The chemical diversity is extremely low and as a result this set has little relevance for substances like pharmaceutical, pesticides, fertilisers and fragrances. The chemical diversity of the water-set is somewhat larger, but still all compounds are basically mono-functional, making its relevance to complex molecules again low. As COSMO-RS handles more complex molecules in the same manner as simple molecules, its performance will not change significantly with such environmentally relevant compounds, whereas UNIFAC is expected to perform even less well.
Figure 4.3.7. Comparison of reported activity coefficients at infinite dilution and activity coefficients calculated by the COSMOtherm estimation software.
Of the 5764 reported water solubilities in the WATERNT database, only ~5% are ≥ 1·106 mg/L (which indicates that they are likely to be MOCs). The general expectation is that activity coefficients for miscible organic chemicals can be estimated within a factor of three or less. Estimated activity coefficients greater than 20 for MOCs imply that a water solubility limit exists and should be considered unreliable. When possible (i.e., if input data are available and users have access to proprietary software), different estimation approaches should be applied in order to assess the level of agreement between model outputs. As discussed above in relation to estimates of SW, a high level of agreement between various estimation methods increases the confidence in the predictions, however this does not necessarily guarantee accuracy.
Finally, the relatively good performance of the estimation methods for activity coefficients compared to water solubility may be somewhat misleading. As discussed above, activity coefficients and water solubilities are inversely related and therefore the accuracy of estimation methods for the two properties is expected to be similar. Evaluations of activity coefficient estimation methods using the much larger water solubility databases would likely give a better indication of true model performance.
It is thus suggested that chemical activities for MOCs can be calculated using activity coefficients, as opposed to SW. However, unlike non-polar neutral organic chemicals, where it might be assumed that activity coefficients in lipids do not show much variability between different chemicals, and that at equilibrium, the chemical activities of chemicals in the water and organic or lipid-like phases are equal, it is less well understood if these assumptions are valid for MOCs. In the next section we attempt to address this challenge by assessing relationships between chemical activities and LC50 data for MOCs exerting baseline toxicity.
3) Case Study: Are the chemical activities corresponding to LC50s for ‘narcotic miscibles’ calculated using Equation 7  consistent with expectations (i.e., La50s ~ 0.01)?
Acute 96-h toxicity data (LC50s) for three miscible baseline toxicants (Verhaar Class 1) and one miscible non-baseline toxicant (Verhaar Class 3) along with the corresponding lethal chemical activities (La50s) are presented in Table 4.3.2. As shown, the La50s for the baseline toxicants fall within the expected range (0.01–0.1) whereas the La50 for the Verhaar Class 3 chemical is orders of magnitude lower.
Table 4.3.2. Preliminary assessment of the chemical activity hypothesis for miscible organic chemicals.
|Compound||Verhaar Class||LC50 (96 h)
|Activity Coefficient (γW)||La50|
Verhaar Class taken from ToxTree (http://toxtree.sourceforge.net/predict/); Activity coefficients are from Sherman et al. (1996). Toxicity data obtained from the Duluth database of acute toxicities to fathead minnow.
Although further case studies would be useful, it appears that the chemical activity approach can be applied to MOCs categorised as baseline toxicants. However, given the frequency with which MOCs occur in the WATERNT database and the limited environmental relevance of exposure to these compounds, research priorities should focus on reducing uncertainties for sparingly soluble compounds (i.e., apolar and polar neutral organic chemicals), particularly those which are solids at ambient temperatures. Nevertheless, with respect to demonstrating the utility and viability of the chemical activity approach, it is deemed worthwhile to include MOCs in case studies aiming to demonstrate ‘proof of concept’.
4) Given the (relatively) low affinity by MOCs for lipids and other non-lipid organic matter, what modifications to the approach for estimating KBW (see above) are necessary?
To more accurately estimate biota-water partitioning for miscible organic chemicals, the freely dissolved concentration in the water (fW) present in the organism can simply be added to the expression for KBW:
4. Application of the chemical activity concept to ionisable organic chemicals (IOCs)
Of the various questions discussed within this workgroup, it was widely acknowledged that the application of the chemical activity concept to IOCs represents one of the most difficult areas to address, largely due to limited availability of data and models applied to this group of chemicals. The first major issue with applying the chemical activity concept to ionisable organic chemicals (IOCs) is that the total water solubility of such compounds is a function of pH in addition to the properties of the chemical. The presence and identity of counterions in solution can also be an important consideration. These dependencies are illustrated in Figures 4.3.8 and 4.3.9.
Figure 4.3.8 is a generic illustration of the pH-solubility profile of an organic acid; Figure 4.3.9 is the reported pH-solubility profile of naproxen and its various salts (Chowhan 1978, Serajuddin 2007), which clearly demonstrates the sensitivity of water solubility to the type of counterion present.
The pHmax is a function of the solubility product (KSP) and hence the counterion(s) present in solution. Below the pHmax, the total solubility of an acidic IOC is simply a function of pH and pKa (Figure 4.3.8) and can be estimated from the intrinsic solubility (i.e., solubility of the neutral form of the chemical) and the extent of dissociation (He and Yalkowsky, 2004; Serajuddin 2007), i.e.,
Above the pHmax, the total solubility is determined by the solubility of the salt complex (i.e., A–S+) and is independent of pH (Figure 4.3.8, 4.3.9). As seen in the naproxen example (Figure 4.3.9), the maximum solubility can vary by roughly two orders of magnitude.
Figure 4.3.8. Generic representation of the pH-solubility profile of an organic acid. The pH-solubility profile for an organic base is essentially the mirror image.
Figure 4.3.9. pH-solubility profile of naproxen (an organic acid with a pKa ~ 4) and its salts (calcium, magnesium, sodium, potassium) in aqueous solution at 25 oC. (Based on Chowhan (1978), Figure above reproduced with permission from Flynn & Roberts 2015)
As aqueous systems in the laboratory and environment can vary greatly in terms of type and concentration of counterions, it is clear that implementing the chemical activity concept is far more challenging for IOCs compared to neutral organics chemicals.
1. To what extent can approaches to calculate chemical activity for neutral organic chemicals be expanded/modified to IOCs? Are methods for estimating the activity coefficients of electrolytes (e.g., Debye–Hückel approach) (Trapp et al., 2010) compatible with methods for neutral organic chemicals?
In an effort to initiate discussion and debate, it is proposed that, as a first approximation, the total chemical activity (aT) could potentially be calculated using the modified version of the equation for neutral organics, formulated here for an organic acid, i.e.,:
where aN, CN, and SN are the chemical activity, concentration and solubility (sub-cooled liquid) of the neutral form of the chemical, respectively, and aC, CC, SC are the chemical activity, concentration and solubility of the charged form (below pHmax), respectively, and SS is the solubility of the dominant salt complex (i.e., A–S+) in the system. Melting points for the neutral chemical (i.e., HA) and the dominant salt complex must be known. It is assumed that i) the IOC is a solid at system temperature, ii) ‘solubility addition’ applies (Banarjee 1984, Smith et al., 2013), and iii) solute-solute interactions are negligible. Assuming that SS will be greater than SN, chemical activities below pHmax are always greater than chemical activities above pHmax (but within a factor of two). However, since the F will be lower, the maximum activity that can be attained, by correcting for F, will be lower (see WG1).
Case Study: Can the intrinsic water solubility (i.e., water solubility of the neutral form) and fraction of chemical in neutral form in solution be used to calculate chemical activity from LC50s?
More simplistically, an initial estimate of chemical activity in water for IOCs could be obtained by ignoring the contribution of the charged form and considering only the neutral species, i.e.,
Note that the concentration of the neutral form of the chemical (CN) can be calculated from the reported total concentration (or LC50) at any pH using the Henderson-Hasselbalch equation, as shown for an organic acid below:
It is important to recognise that even if the approximations presented above are reliable, pH differences between the bulk water phase and biological fluids (e.g., blood, cytoplasm) further complicate the analysis of IOCs due to the ‘ion trapping’ effect (e.g., Neuwoehner and Escher, 2011). The ‘ion trapping’ effect refers to the differential accumulation of the charged species in external water vs. internal fluids, as determined by the concentration of the neutral form (assumed equal) and pH-dependent speciation (i.e., ratio of neutral to charged form) in both phases.
Aquatic toxicity data for six pharmaceuticals at three different bulk water pHs are summarised in the following Table 4.3.3 (Boström and Berglund 2015). These data illustrate the commonly reported pH-dependence of aquatic toxicity data based on external water concentration. Irrespective of the equation used, it is clear that chemical activities estimated following Equation 3 will not collapse towards a consistent value but rather will be a function of bulk pH. Such results are in contrast to analyses based on measured or estimated Critical Body Residue (CBR) or membrane concentrations, which tend to cluster around a common value (e.g., Nakamura et al., 2008; Neuwoehner and Escher, 2011). To demonstrate this, a CBR-based analysis of aquatic toxicity data for fluoxetine is presented in Table 4.3.4.
Table 4.3.3. Aquatic toxicity data for six pharmaceuticals at three different bulk water pHs. EC50s are reported in terms of total water concentration and concentration of neutral species only (estimated using the Henderson-Hasselbalch equation)
Table 4.3.4. Aquatic toxicity data for fluoxetine in fish analysed using the CBR approach based on experimental LC50s and bioconcentration factors (BCFs) (Nakamura et al., 2008)
|pH||LC50 (mM)||Reported BCF (L/kg)||CBR (mM)
BCF * LC50
The fact that the CBRs for fluoxetine shown in Table 4.3.4 collapse to a common value requires a common chemical activity in the organism, which could be calculated either from the CBR or an estimate of the internal water concentration. Regardless, while it may be possible to address the pH-dependence of toxicity data of IOCs (e.g., by accounting for ion trapping or using a BCF model to convert external concentrations to internal burdens) (Neuwoehner and Escher, 2011; Armitage et al., 2013), it is obvious that the application of the chemical activity approach to such chemicals is hindered by the additional assumptions and calculations that are likely to be required.
In summary, the simplified approaches for calculating chemical activity for IOCs in water described above suggest it is possible to develop estimation methods for further evaluating the chemical activity hypothesis for IOCs, however, additional research is needed to further support and validate observations reported here.