# File I/O#

Now that you are confident in mathematical operations and the plotting of arrays of data, it will be useful to be able to read data (for example from an experimental measurement) into the Python kernel. This will allow you to perform mathematical operations on data from the file and potentially plot the analysed result.

In this section, we will show the parsing of two common types of file, known as .txt (text files) and .csv (comma-separated value file). The same basic function, from the numpy library, is used to read these files in, with some changes necessary to the keyword arguments that are passed to the function.

## .txt files#

Text files look like the following,

# A text file with ideal gas information
# The rows are temperature (K), volume (cubic meters), and pressure (Pa)
200 600 1000 1400 1800
0.8 0.2 1.0 0.6 0.1
5020 60370 20110 46940 362160

This file is available from this link, you will need to upload the file to the chemistry JupyterHub. We can input the data from this file into three NumPy arrays with the following function.

import numpy as np

print(temperature)
[ 200.  600. 1000. 1400. 1800.]
print(volume)
[0.8 0.2 1.  0.6 0.1]
print(pressure)
[  5020.  60370.  20110.  46940. 362160.]

We can then plot the pressure multiplied by the volume, against the temperature to observe the ideal gas relationship that was introduced in the previous exercise.

import matplotlib.pyplot as plt
plt.plot(temperature, pressure * volume, 'o')
plt.xlabel('pV')
plt.ylabel('T')
plt.show()

Often experimental devices will output data as columns, rather than rows (this is probably more common in real measurement devices), such as in this file, shown below.

# A text file with columnar ideal gas information
# temperature (K) volume (cubic meters) pressure (Pa)
200 0.8 5020
600 0.2 60370
1000 1.0 20110
1400 0.6 46940
1800 0.1 362160

Therefore it is necessary to modify the np.loadtxt() function to account for this. To do so, we use a keyword argument, this is a function argument that has a special identifier preceding it. To read in the columnar data we can use the following.

temperature, volume, pressure = np.loadtxt('ideal_gas_cols.txt', unpack=True)
print(temperature)
[ 200.  600. 1000. 1400. 1800.]
print(volume)
[0.8 0.2 1.  0.6 0.1]
print(pressure)
[  5020.  60370.  20110.  46940. 362160.]

For this function, to get a columnar data read in we set the variable in the arguments unpack to be True. By default this variable is False, leading to the first result above.

## .csv files#

.csv files are extremely popular output data files for experimental equipment. While the .txt files above use white-space (spaces or tabs) to separate the values in a row, the .csv format uses commas (hence the name). The .csv equivalent of the previous text files are available as rows and as columns. We can use the Jupyter notebook to have a look at the files if we wish, using the !head command (this is part of the Jupyter Notebook not standard Python).

# A csv file with columnar ideal gas information
# temperature (K) volume (cubic meters) pressure (Pa)
200, 0.8, 5020
600, 0.2, 60370
1000, 1.0, 20110
1400, 0.6, 46940
1800, 0.1, 362160

This command will print the first ten lines of a file since this file is less than ten lines, we can see the whole thing.

The reading of .csv files requires another keyword argument, this time called delimiter (this is the name given to the string that separates the values). Therefore, for .csv files the delimiter is the comma.

temperature, volume, pressure = np.loadtxt('ideal_gas_cols.csv', unpack=True, delimiter=',')
print(temperature)
[ 200.  600. 1000. 1400. 1800.]
print(volume)
[0.8 0.2 1.  0.6 0.1]
print(pressure)
[  5020.  60370.  20110.  46940. 362160.]

Note that the np.loadtxt() function will only read in numerical information, and ignores completely any lines starting with a hash symbol, #.

## Exercise:#

• Investigate the IR spectra data for toulene, initially using the !head command and then read in the data with the np.loadtxt() function.

• Plot the data and correctly label the axis, following the information given in the file header (the part with the # symbols that is not parsed by the np.loadtxt() function).

Worked Example