Exercises#

You have now worked through the complete workflow for fitting linear models to data. The exercises here give you practice applying this workflow to real chemical systems, where the context adds physical interpretation and requires data handling skills like file reading and coordinate transformations.

1. Linear fitting of the Van’t Hoff equation#

The Van’t Hoff equation relates the change in the equilibrium constant, \(K\), for a reaction, to a change in temperature, \(T\), given that the enthalpy change for the reaction, \(\Delta H\), is constant over the temperature range of interest.

\[\frac{\mathrm{d} \ln K}{\mathrm{d} T} = \frac{\Delta H}{RT^2}\]

This equation is derived from two standard expressions for the Gibbs free energy change for a reaction:

\[\Delta G = \Delta H - T \Delta S\]
\[\Delta G = -RT \ln K\]

Combining these two equations gives the linear form of the Van’t Hoff equation:

\[\ln K = -\frac{\Delta H}{RT} + \frac{\Delta S}{R}\]

Differentiating with respect to \(1/T\) (keeping \(\Delta H\) and \(\Delta S\) constant) gives the derivative form of the Van’t Hoff equation.

The linear form of the Van’t Hoff equation has the functional form of the equation for a straight line (hence “linear form”)

\[y = mx + c\]

Plotting experimental data as \(\ln K\) against \(1/T\) should, therefore, give an approximate straight line (as long as the enthalpy and entropy of reaction are approximately constant over the temperature range of interest). By fitting a straight line to these data, we can obtain “best fit” values for the slope and intercept, which correspond to \(\Delta H/R\) and to \(\Delta S/R\) in our model. By fitting to data we can, therefore, estimate \(\Delta H\) and \(\Delta S\) for a given reaction.

(a) Read the data#

The data file exercise_1.dat gives the following experimentally measured equilibrium constants at different temperatures for the reaction

\[2\mathrm{NO}_2 \rightleftharpoons \mathrm{N}_2\mathrm{O}_4\]

temperature (°C)

\(K\)

9

34.3

20

12

25

8.79

33

4.4

40

2.8

52

1.4

60

0.751

70

0.4

Read these data from the file exercise_1.dat into two numpy arrays.

(b) Transform and plot the data#

Convert your raw data into \(1/T\) (in inverse Kelvin) and \(\ln K\).

Plot these transformed data and confirm that they approximately show a linear relationship.

(c) Define model and error functions#

You have already written functions for a linear model and error function in the previous section. Write similar functions here, or adapt your previous code. Test your model by plotting the data alongside the model prediction for \(m\) = 7400 and \(c\) = -22.5 to verify your model approximately describes the data.

(d) Find best-fit parameters using minimize()#

Use scipy.optimize.minimize() to find the “best fit” values of \(m\) and \(c\) for your data. Use the previous values of \(m\) = 7400 and \(c\) = −22.5 as your starting guess.

(e) Plot the fitted model#

Plot the fitted model Replot your data, now including your “line of best fit”

(f) Compare with scipy.stats.linregress()#

One way to perform linear regression in Python without writing your own model function and error function is to use scipy.stats.linregress():

from scipy.stats import linregress

result = linregress(x, y)

You can then access the slope and intercept as result.slope and result.intercept.

Use scipy.stats.linregress() to perform linear regression on your data, and confirm that you get the same best fit parameters, \(m\) and \(c\), as before.

2. Non-linear fitting of kinetic data#

The data in exercise_2.dat describe the decomposition of hydrogen peroxide catalysed by CeIII.

\[2\mathrm{H}_2\mathrm{O}_2 \overset{\mathrm{Ce}^\mathrm{III}}{\longrightarrow} 2\mathrm{H}_2\mathrm{O} + \mathrm{O}_2\]

Under conditions where CeIII is present in excess, the reaction follows first-order kinetics with respect to H2O2.

Time (s)

[H2O2] / mol dm−3

2

6.23

4

4.84

6

3.76

8

3.20

10

2.60

12

2.16

14

1.85

16

1.49

18

1.27

20

1.01

(a) Read and plot the data#

Read these data to numpy arrays, and plot time versus the concentration of H2O2.

(b) Define the kinetic model function#

Under the experimental conditions, the decomposition of H2O2 follows first-order kinetics. This means the concentration of H2O2 decreases exponentially with time according to:

\[[\mathrm{H}_2\mathrm{O}_2](t) = [\mathrm{H}_2\mathrm{O}_2]_0\exp(-kt)\]

where \([\mathrm{H}_2\mathrm{O}_2]_0\) is the initial concentration and \(k\) is the first-order rate constant.

For generality, we can write this using \([\mathrm{A}](t)\) to represent the concentration of any reactant following first-order kinetics:

\[[\mathrm{A}](t) = [\mathrm{A}]_0\exp(-kt)\]

This will be our mathematical model for fitting the experimental data.

Write a function first_order() that calculates \([\mathrm{A}](t)\) given three parameters: time \(t\), initial concentration \([\mathrm{A}]_0\), and rate constant \(k\). Your function should work with both

(b) Define the kinetic model function#

This reaction has a first-order rate equation, which will be our “model” for fitting the data:

\[[\mathrm{A}](t) = [\mathrm{A}]_0\exp(-kt)\]

Write a function first_order() to calculate this \([\mathrm{A}](t)\) as a function of \(t\), \([\mathrm{A}]_0\), and \(k\).

(c) Write an error function#

Write a function error_function() that calculates the least-squares error for your first-order kinetic model compared to the experimental data.

(d) Find best-fit parameters using minimize()#

Use scipy.optimize.minimize() to find the best-fit values of \([\mathrm{A}]_0\), and \(k\). Use starting guesses of \(k\) = 0.1 and \(A\) = 7.

(e) Plot the fitted model#

Replot the raw data, plus the curve for your best-fit parameters.

(f) Use scipy.optimize.curve_fit()#

You can also perform non-linear least-squares fitting, without having to define an error function, using scipy.optimize.curve_fit().

from scipy.optimize import curve_fit

To use curve_fit(), you pass your model function, and the \(x\) and \(y\) data:

popt, pcov = curve_fit(model_function, x_data, y_data)

curve_fit() returns two numpy arrays. The first array contains the optimised model parameters. The second array contains a 2D array that describes the covariance of these parameters. The diagonal elements of this array are estimated uncertainties (standard deviations) for each model parameter, and the off-diagonal elements give information about the degree of correlation between the model parameters.

Use curve_fit() to confirm your best fit model parameters for your first-order kinetic model. You will need to pass in as arguments your model function, the observed \(x\) data (time), and the observed \(y\) data (H2O2 concentration).

(g) Optional: Using linearization to find initial guesses#

Non-linear fitting can sometimes be sensitive to initial parameter guesses, particularly for exponential models. One useful strategy when you encounter convergence difficulties is to linearise the model first.

For a first-order kinetic model, taking the natural logarithm of both sides gives:

\[\ln[\mathrm{A}](t) = \ln[\mathrm{A}]_0 - kt\]

This is linear in \(t\) with slope \(-k\) and intercept \(\ln[\mathrm{A}]_0\).

  1. Transform your data: calculate \(\ln[\mathrm{H}_2\mathrm{O}_2]\)

  2. Use scipy.stats.linregress() to fit this linearised form

  3. Extract approximate values for \(k\) and \([\mathrm{A}]_0\) from the fit

  4. Use these as initial guesses in your non-linear fit from part (d)

Compare the fitted parameters from the linearised fit with those from the non-linear fit. You should find they are slightly different. The non-linear fit (fitting the exponential model directly) is generally preferred and gives more accurate parameter estimates. The linearised fit is more numerically stable with respect to the starting guess though, so is useful primarily as a way to obtain good initial guesses for the non-linear fit.