# Model fitting¶

A lot of chemical analysis relies on our ability to fit a “line of best fit” to some data. For example, considering the determination of the activation energy, $$E_a$$, of some reaction from an Arrhenius relationship,

$k(T) = A \exp{\frac{-E_a}{RT}},$

where, $$k(T)$$ is the rate constant at time $$T$$, $$A$$ is the pre-exponential factor, $$R$$ is the ideal gas constant and $$T$$ is the temperature. Typically when we are performing our analysis, we will rearrange the above equation to give a straight line,

$\ln{k(T)} = \ln{A} - \frac{E_a}{RT}.$

Let’s looks at some experimental data for the decomposition of 2 moles of NO2 to 2 moles of NO and one mole of O2. The data for this is available in this file.

import numpy as np
import matplotlib.pyplot as plt

T, k = np.loadtxt('arrhenius.txt', unpack=True)

plt.plot(T, k, 'o--')
plt.xlabel('Temperature/K')
plt.ylabel('Rate constant/M$^{-1}$s$^{-1}$')
plt.show() Alternatively, we plot this as a straight line.

plt.plot(1/T, np.log(k), 'o--')
plt.xlabel('1/Temperature/K')
plt.ylabel('ln(Rate constant/M$^{-1}$s$^{-1}$)')
plt.show() Using Python, we can fit this data in either configuration, either by performing a linear regression or by fitting a line of best fit.

## Linear regression¶

Linear regression is a relatively simple mathematical process, that is implemented in the scipy.stats module.

from scipy.stats import linregress


If we investigate the documentation for the linregress function, we will see that we pass two arrays; x and y. These corresponding exactly to the x and y-axes of the plot above.

result = linregress(1/T, np.log(k))

result

LinregressResult(slope=-13509.355973303444, intercept=15.161037389899018, rvalue=-0.998952302406746, pvalue=4.070235273533486e-05, stderr=357.31270179790863, intercept_stderr=0.5715179164952109)


We can see that the object that is returned contains a couple of pieces of information. The two parts that we are most interested in are the slope and the intercept, which are attributes of the result object and can be obtained as follows.

print(result.slope)

-13509.355973303444

print(result.intercept)

15.161037389899018


Therefore, we can evaluate the activation energy and the pre-exponential factor as follows.

from scipy.constants import R

activation_energy = - R * result.slope
preexponential_factor = np.exp(result.intercept)

print(f"A = {preexponential_factor}; E_a = {activation_energy}")

A = 3840209.1525825486; E_a = 112323.03523328649


Finally, we can plot our linear regression line through the data as shown below.

x = np.linspace(1/660, 1/590, 1000)

plt.plot(1/T, np.log(k), 'o')
plt.plot(x, result.slope * x + result.intercept, '-')
plt.xlabel('1/Temperature/K')
plt.ylabel('ln(Rate constant/M$^{-1}$s$^{-1}$)')
plt.show() ## Non-linear regression¶

Of course, our equation cannot always be modified to offer a straight line formulation or we may not want to as the logarithmic scaling of the y-axis may skew our understanding of the intercept. Therefore, necessary to fit a non-linear relationship. This can also be achieved with Python by minimising the difference between our model (the equation with given parameters) and the data. The value that we will minimize is known as the $$\chi^2$$-value, which has the following definitions.

$\chi^2 = \sum\big\{[y - f(x)] ^ 2\big\} \;\;\;\text{or}\;\;\; \chi^2 = \sum\Bigg\{\frac{[y - f(x)] ^ 2}{y_{\text{err}}^2}\Bigg\},$

depending on whether or not uncertainty in the data is known.

Consider the following data, which describe the decomposition of hydrogen peroxide in the presence of excess cerous ion CeIII,

2H2O2 + 2CeIII → 2H2O + CeIV + O2.

Time/s

2

4

6

8

10

12

14

16

18

20

[H2O2]/moldm-3

6.23

4.84

3.76

3.20

2.60

2.16

1.85

1.49

1.27

1.01

First, we will store these values as NumPy arrays.

c = np.array([6.23, 4.84, 3.76, 3.20, 2.60, 2.16, 1.85, 1.49, 1.27, 1.01])
t = np.array([2, 4, 6, 8, 10, 12, 14, 16, 18, 20])

plt.plot(t, c, 'o')
plt.xlabel('Time/s')
plt.ylabel('[H$_2$O$_2$]/M')
plt.show() This reaction has a first-order rate equation, this will be our “model” that we will compare to the data, where the $$[A]_0$$ and $$k$$ are the varying parameters. This function is defined below.

def first_order(t, a0, k):
"""
The first-order rate equation.

Args:
t (float): Time (s).
a0 (float): Initial concentration (mol/dm3).
k (float): Rate constant (s-1).

Returns:
(float): Concentration at time t (mol/dm3).
"""
return a0 * np.exp(-k * t)


We want to vary a0 and k to minimise the $$\chi^2$$-value between this model and the data.

def chi_squared(parameters, t, conc):
"""
Determine the chi-squared value for a first-order rate equation.

Args:
parameters (list): The variable parameters of first_order (a0 and k).
t (float): The experimental time (s) data.
conc (float): The experimental concentration data which first_order aims to model.

Returns:
(float): chi^2 value.
"""
a0 = parameters
k = parameters
return np.sum((conc - first_order(t, a0, k)) ** 2)


Just like the minimisation of a potential model. We can use the scipy.optimize.minimize to minimize this chi-squared function.

from scipy.optimize import minimize

result = minimize(chi_squared, [10, 1], args=(t, c))

result

      fun: 0.13373044840242107
hess_inv: array([[7.88767889e-01, 1.26315543e-02],
[1.26315543e-02, 3.02236037e-04]])
jac: array([1.45286322e-07, 2.03587115e-06])
message: 'Optimization terminated successfully.'
nfev: 63
nit: 10
njev: 21
status: 0
success: True
x: array([7.42164414, 0.10308256])

x = np.linspace(2, 20, 1000)

plt.plot(t, c, 'o')
plt.plot(x, first_order(x, result.x, result.x))
plt.xlabel('Time/s')
plt.ylabel('[H$_2$O$_2$]/M')
plt.show() print(f"[A]_0 = {result.x}; k = {result.x}")

[A]_0 = 7.421644140842236; k = 0.10308255574195259


scipy.optimize.curve_fit offers a convenient wrapper around this methodology, and makes it easy to access the estimated parameter uncertainties.

from scipy.optimize import curve_fit

popt, pcov = curve_fit(first_order, t, c)

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/ipykernel_launcher.py:13: RuntimeWarning: overflow encountered in exp
del sys.path


We can ignore the above error, this is just warning that during the optimization an unphysical value was found. However, the curve_fit function is robust enough to ignore this and continue.

uncertainties = np.sqrt(pcov)

print(f"[A]_0 = {popt} +/- {uncertainties}; k = {popt} +/- {uncertainties}")

[A]_0 = 7.42164476326434 +/- 0.15838876657483184; k = 0.10308257141927114 +/- 0.00308836766268339


## Exercise¶

1. Modify the above first-order rate fitting using the minimize method to consider a y-uncertainty of 5 %.

2. By investigating the documentation for the scipy.optimize.curve_fit, determine how this uncertainty would be included in the curve_fit method.

Worked Examples