File input and output#

When working with scientific data, we often have these data saved in a file on our computer. To perform some computational analysis or to plot our results we need to load these data into our code; entering data by hand presents an opportunity for mistyped values, and for datasets with more than just a few entries quickly becomes impractical.

After having run our computational analysis on our input data, we usually end up with some output data, and we may with to then save this output back to disk. This might be because we want to run a second stage of computational analysis, or simply because we want our final output dataset to be available without having to rerun our full computational workflow.

Reading and writing plain text files#

The most common file type used for storing data is plain text. These files contain anything that can be encoded in a string; i.e., alphanumeric characters and typical symbols you can type on a keyboard.

The text below shows the contents of an example plain text file that contains data corresponding to the volume and pressure of an ideal gas over a range of temperatures.

# A text file with ideal gas information
# The rows are temperature (K), volume (cubic meters), and pressure (Pa)
200 600 1000 1400 1800
0.8 0.2 1.0 0.6 0.1
5020 60370 20110 46940 362160

Reading text files#

To read from a text file we first create a file object, which will allow us to interact with the file on disk.

To open a file for reading, we use the open() built-in function, which takes two arguments: the first arguemnt is the name of the file we want to open, given as a string, and the second argument specifies the mode, which defines how we are going to interact with the file (read, write, etc.).

To open the file above, we can use

f = open('ideal_gas_rows.txt', 'r') # 'r' specifies that we are opening this file for reading.

We can now read from this file, storing the data we read as a string.

data = f.read()
print(data)
# A text file with ideal gas information
# The rows are temperature (K), volume (cubic meters), and pressure (Pa)
200 600 1000 1400 1800
0.8 0.2 1.0 0.6 0.1
5020 60370 20110 46940 362160

Using file_object.read() reads the entire contents of the file as a single string. If we want to pull out separate lines, we can use the string method split() to split this single string into a list of strings:

data.split('\n')
['# A text file with ideal gas information',
 '# The rows are temperature (K), volume (cubic meters), and pressure (Pa)',
 '200 600 1000 1400 1800',
 '0.8 0.2 1.0 0.6 0.1',
 '5020 60370 20110 46940 362160']

So, for example, to read the temperature data (in the third line of the file) we could use the following code

f = open('ideal_gas_rows.txt', 'r')
data = f.read()
lines = data.split('\n')
temperatures_as_strings = lines[2].split() # split the second line into separate strings.
temperatures = [float(s) for s in temperatures_as_strings]
temperatures
[200.0, 600.0, 1000.0, 1400.0, 1800.0]

Here, we have split() our input twice: the first time we split into separate lines, and the second time we split the line containing the data we want into separate words. This gives us a list of strings which we then convert to a list of floats using a list comprehension.

We can write similar code to extract the volume and pressure data.

volumes_as_strings = lines[3].split()
pressures_as_strings = lines[4].split()

volumes = [float(s) for s in volumes_as_strings]
pressures = [float(s) for s in pressures_as_strings]

print(volumes)
print(pressures)
[0.8, 0.2, 1.0, 0.6, 0.1]
[5020.0, 60370.0, 20110.0, 46940.0, 362160.0]

Wanting to read in a file as separate lines is a pretty common use-case, and we can do this in a single step using readlines(), and avoid having to explicitly split the full file string.

f = open('ideal_gas_rows.txt', 'r')
lines = f.readlines()
temperatures_as_strings = lines[2].split() # split the second line into separate strings.
temperatures = [float(s) for s in temperatures_as_strings]
temperatures
[200.0, 600.0, 1000.0, 1400.0, 1800.0]

Having read our data, we can now close the file.

f.close()

### Writing text files

If we wanted to now write our data to a new file we would again use open, but this time set the mode with 'w'. We can then use the write() method to write strings to the file.

# first, convert our list of floats to a list of strings and join them together
data_string = ' '.join([str(t) for t in temperatures])
f = open('new_text_file.txt', 'w')
f.write('# this is our new file\n')
# We need to include the code for a newline at the end of our string
f.write(data_string)
f.close()
!head new_text_file.txt
# this is our new file
200.0 600.0 1000.0 1400.0 1800.0

Reading and writing files using numpy#

As you can see from the example code above, reading and writing text files containing numerical data using built-in Python commands involves writing code to split strings and to convert from strings to numerical data, when reading, and to convert from numerical data to strings and to join these strings together when writing.

Fortunately, numpy provides functions that can be used to read and write files containing numerical data much more easily.

import numpy as np

temperature, volume, pressure = np.loadtxt('ideal_gas_rows.txt')
temperature, volume, pressure
(array([ 200.,  600., 1000., 1400., 1800.]),
 array([0.8, 0.2, 1. , 0.6, 0.1]),
 array([  5020.,  60370.,  20110.,  46940., 362160.]))

Let us look at our text file again:

!cat ideal_gas_rows.txt
# A text file with ideal gas information
# The rows are temperature (K), volume (cubic meters), and pressure (Pa)
200 600 1000 1400 1800
0.8 0.2 1.0 0.6 0.1
5020 60370 20110 46940 362160

When we were using built-in Python to read the file, we needed to remember that our data started on line 3. Using numpy we have been able to automatically skip the first two lines, which both start with #.

‘loadtxt’ takes an optional ‘comments’ argument that specifies characters or a list of characters that indicate the start of a comment. The default value is #, and so lines in our file that start with # are ignored, by default.

Quite often, data are recorded in text files as columns rather than rows, e.g.,

# A text file with columnar ideal gas information
# temperature (K) volume (cubic meters) pressure (Pa)
200 0.8 5020
600 0.2 60370
1000 1.0 20110
1400 0.6 46940
1800 0.1 362160

loadtxt can read columnwise data if we set an additional keyword argument, unpack=True:

temperature, volume, pressure = np.loadtxt('ideal_gas_cols.txt', unpack=True)

temperature, volume, pressure
(array([ 200.,  600., 1000., 1400., 1800.]),
 array([0.8, 0.2, 1. , 0.6, 0.1]),
 array([  5020.,  60370.,  20110.,  46940., 362160.]))

.csv files#

.csv files (Comma Separated Values) are extremely common output data files for experimental equipment, and are a standard export format forspreadsheet software such as Excel.

.csv files are plain text files, like the examples above, but use commas, rather than whitespace (spaces or tabs) to separate values in a row (hence the name).

A .csv format version of the columnwise file above might look like:

# A csv file with columnar ideal gas information
# temperature (K) volume (cubic meters) pressure (Pa)
200, 0.8, 5020
600, 0.2, 60370
1000, 1.0, 20110
1400, 0.6, 46940
1800, 0.1, 362160

We can read .csv files using loadtxt by providing another keyword arugment to specify the “delimiter”; i.e., the character that separates values in a row.

temperature, volume, pressure = np.loadtxt('ideal_gas_cols.csv', unpack=True, delimiter=',')
temperature, volume, pressure
(array([ 200.,  600., 1000., 1400., 1800.]),
 array([0.8, 0.2, 1. , 0.6, 0.1]),
 array([  5020.,  60370.,  20110.,  46940., 362160.]))

Exercise:#

  1. Investigate the IR spectra data for toulene. You can create a plain text file in your Jupyter directory by selecting New > Text File from the file browser window.

../_images/new_text_file.png
2. Read in the data with the `np.loadtxt()` function. 3. Plot the data and correctly label the axis, following the information given in the file header (the part at the top of the file that is marked as comments).