File I/O

File I/O#

In all of the exercises you have completed thus far, any data that you need has been provided to you directly as Python objects e.g.

concentration_data = [0.001, 0.002, 0.004, 0.006, 0.008, 0.010, 0.020]

This is not really representative of a typical data anlysis workflow. Most of the time, if you want to analyse some data, you will already have collected that data and stored it in some kind of file, perhaps a csv for example. It seems prudent therefore to learn how we can read files into Python and write files back out of Python: file I/O (input/output).

The general case: reading in files#

The most general way to read data from a file in Python is to use the built-in open function. Let’s look at a simple example: reading in a file that contains some simple text. We’re going to look at example.txt, which looks like this:

EXAMPLE TEXT FILE

Here is some text.
Here is some more text.

Here is even more text.

Here’s how we can read in this file in Python:

with open('example.txt', 'r') as stream:
    lines = stream.readlines()

for line in lines:
    print(line, end='')

EXAMPLE TEXT FILE

Here is some text.
Here is some more text.

Here is even more text.

To try this example yourself, download example.txt.

Important

When you download files from this course book, they will open up in your web browser as raw text. To actually get the file, you will have to create a blank text file on your Noteable instance (New -> Text File) and copy the text into there (naming the file appropriately).

Let’s take this example in sections. First up, we have the open function:

with open('example.txt', 'r') as stream:

Here we provide two arguments: the path to the file we want to open (example.txt) and what we would like to do with that file ('r' for read).

As you will have already have noticed, we have also used a new keyword: with. Here we are doing something quite similar to import statements such as:

import numpy as np

We are effectively “nicknaming” the output of the open function and calling it stream instead. You could of course call it something else instead, here we use stream as shorthand for a file stream: a stream of data read from a file.

We end the first line with a colon : much like function definitions and loops, after which we indent all of the code which needs to access stream.

    lines = stream.readlines()

The next line actually manipulates the content of the file. The open function returns an object which contains all of the data associated with the file, but not necessarily in human-readable or immediately useful way. By calling the readlines method, we store a list containing each line of the file. The remaining code simply prints these lines for our inspection:

for line in lines:
    print(line, end='')

We have used the end keyword argument here just to prevent the print function from adding needless whitespace to the output (by default each line passed to print will be followed by a newline).

Writing files#

Now that we’ve read in our simple text file, let’s make some modifications to it and write it back out again.

We now have the contents of the text file available to us in the form of a list of strings:

lines

['EXAMPLE TEXT FILE\n',
 '\n',
 'Here is some text.\n',
 'Here is some more text.\n',
 '\n',
 'Here is even more text.\n']

Note that each \n is a newline character which will actually become a newline when passed to the print function:

print('Line 1\nLine 2')

Line 1
Line 2

Let’s make some changes to lines, starting by removing the last two:

lines = lines[:-2]

lines

['EXAMPLE TEXT FILE\n',
 '\n',
 'Here is some text.\n',
 'Here is some more text.\n']

Now let’s add a new line:

lines.append('This new text was added in Python!')

lines

['EXAMPLE TEXT FILE\n',
 '\n',
 'Here is some text.\n',
 'Here is some more text.\n',
 'This new text was added in Python!']

And finally, to write our lines to a new file:

with open('modified_example.txt', 'w') as stream:
    for line in lines:
        stream.write(line)

If you followed along with this entire example, you should see that a new file modified_example.txt has now been created in the same directory as your Jupyter notebook - take a look.

As you can see in the code above, writing a file in Python looks much like reading a file: we use the open function for both use cases. The difference is that here we specify that we want to write a file by passing 'w' as the second argument, and we use the write method rather than the readlines method. The write method takes a single string as an argument, this is why we have used a for loop to write each individual line to a file. We could also have combined all of the lines into a single string beforehand and then passed this to the write method, either way will work just fine.

Reading in scientific data with `numpy`#

What we have just been through is the most general case: how to read in any file, regardless of what type of data is contained within. For our purposes, we are primarily intereseted in scientific data, numbers that have been collected during some series of experiments. There are many Python packages that can be used to read such data, here we’re going to rely on one that we’ve already been introduced to: numpy.

Time for another example file, this time some experimental data looking at the temperature dependence of the equilibrium constant for a reaction:

# T / K | K
2.38e38
2.15e30
3.86e24
1.89e20
8.42e16
1.75e14
1.12e12
1.67e10
4.73e08
2.24e07
1.59e06
1.57e05
2.03e04
3.30e03
6.50e02
1.51e02
4.01e01
1.21e01
4.02e00
1.47e00
5.82e-01

Follow this link to download this file.

This data is formatted as simple text in a table of sorts, with values on each line being separated by whitespace. Simple tabular data can be read from files like this using the loadtxt function:

import numpy as np

data = np.loadtxt('thermodynamic_data.dat')

print(data)
type(data)

[[1.00e+02 2.38e+38]
 [1.20e+02 2.15e+30]
 [1.40e+02 3.86e+24]
 [1.60e+02 1.89e+20]
 [1.80e+02 8.42e+16]
 [2.00e+02 1.75e+14]
 [2.20e+02 1.12e+12]
 [2.40e+02 1.67e+10]
 [2.60e+02 4.73e+08]
 [2.80e+02 2.24e+07]
 [3.00e+02 1.59e+06]
 [3.20e+02 1.57e+05]
 [3.40e+02 2.03e+04]
 [3.60e+02 3.30e+03]
 [3.80e+02 6.50e+02]
 [4.00e+02 1.51e+02]
 [4.20e+02 4.01e+01]
 [4.40e+02 1.21e+01]
 [4.60e+02 4.02e+00]
 [4.80e+02 1.47e+00]
 [5.00e+02 5.82e-01]]

numpy.ndarray

We end up with a numpy array containing all of the data in the file. We can tell just from counting square brackets that this array is two-dimensional, in other words it’s an array of arrays: a matrix.

data[0]

array([1.00e+02, 2.38e+38])

As you can see from the code above, the first row of data is [0.001, 0.005] which is indeed the first row of data in the original file. This makes sense, but in all liklihood what we actually want is all of the temperature values in one array, and all of the equilibrium constant data in another array. We can achieve this by transposing the array:

transposed_data = data.T

print(transposed_data)

[[1.00e+02 1.20e+02 1.40e+02 1.60e+02 1.80e+02 2.00e+02 2.20e+02 2.40e+02
  2.60e+02 2.80e+02 3.00e+02 3.20e+02 3.40e+02 3.60e+02 3.80e+02 4.00e+02
  4.20e+02 4.40e+02 4.60e+02 4.80e+02 5.00e+02]
 [2.38e+38 2.15e+30 3.86e+24 1.89e+20 8.42e+16 1.75e+14 1.12e+12 1.67e+10
  4.73e+08 2.24e+07 1.59e+06 1.57e+05 2.03e+04 3.30e+03 6.50e+02 1.51e+02
  4.01e+01 1.21e+01 4.02e+00 1.47e+00 5.82e-01]]

Now we have an array that contains all the same data as before, but the rows are now columns and vice versa. This allows us to very easily assign the temperature and equilibrium constant values to separate variables:

temperature, K = transposed_data

print(temperature)
print(K)

[100. 120. 140. 160. 180. 200. 220. 240. 260. 280. 300. 320. 340. 360.
 380. 400. 420. 440. 460. 480. 500.]
[2.38e+38 2.15e+30 3.86e+24 1.89e+20 8.42e+16 1.75e+14 1.12e+12 1.67e+10
 4.73e+08 2.24e+07 1.59e+06 1.57e+05 2.03e+04 3.30e+03 6.50e+02 1.51e+02
 4.01e+01 1.21e+01 4.02e+00 1.47e+00 5.82e-01]

We could also achieve this by changing how we originally read in the file:

temperature, K = np.loadtxt('thermodynamic_data.dat', unpack=True)

print(temperature)
print(K)

[100. 120. 140. 160. 180. 200. 220. 240. 260. 280. 300. 320. 340. 360.
 380. 400. 420. 440. 460. 480. 500.]
[2.38e+38 2.15e+30 3.86e+24 1.89e+20 8.42e+16 1.75e+14 1.12e+12 1.67e+10
 4.73e+08 2.24e+07 1.59e+06 1.57e+05 2.03e+04 3.30e+03 6.50e+02 1.51e+02
 4.01e+01 1.21e+01 4.02e+00 1.47e+00 5.82e-01]

Here we have added the unpack keyword argument, which automatically transposes the output array for us, we are then using multiple assignment to immediately split the data into separate variables.

For one final example, let’s look at reading in a csv file:

# T / K | K
100, 2.38e38
120, 2.15e30
140, 3.86e24
160, 1.89e20
180, 8.42e16
200, 1.75e14
220, 1.12e12
240, 1.67e10
260, 4.73e08
280, 2.24e07
300, 1.59e06
320, 1.57e05
340, 2.03e04
360, 3.30e03
380, 6.50e02
400, 1.51e02
420, 4.01e01
440, 1.21e01
460, 4.02e00
480, 1.47e00
500, 5.82e-01

Follow this link to download this file.

This example is actually the same data as the previous case, but now with commas separating the values rather than solely whitespace. This seemingly minor difference is actually quite important, as we can see if we try to read in this file in the same way as the previous example:

temperature, K = np.loadtxt('thermodynamic_data.csv', unpack=True)

print(temperature)
print(K)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 temperature, K = np.loadtxt('thermodynamic_data.csv', unpack=True)
print(temperature)
print(K)

File /opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/numpy/lib/_npyio_impl.py:1395, in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows, quotechar, like)
if isinstance(delimiter, bytes):
   delimiter = delimiter.decode('latin1')
-> 1395 arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
           converters=converters, skiplines=skiprows, usecols=usecols,
           unpack=unpack, ndmin=ndmin, encoding=encoding,
           max_rows=max_rows, quote=quotechar)
return arr

File /opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/numpy/lib/_npyio_impl.py:1046, in _read(fname, delimiter, comment, quote, imaginary_unit, usecols, skiplines, max_rows, converters, ndmin, unpack, dtype, encoding)
   data = _preprocess_comments(data, comments, encoding)
if read_dtype_via_object_chunks is None:
-> 1046     arr = _load_from_filelike(
       data, delimiter=delimiter, comment=comment, quote=quote,
       imaginary_unit=imaginary_unit,
       usecols=usecols, skiplines=skiplines, max_rows=max_rows,
       converters=converters, dtype=dtype,
       encoding=encoding, filelike=filelike,
       byte_converters=byte_converters)
else:
   # This branch reads the file into chunks of object arrays and then
   # casts them to the desired actual dtype.  This ensures correct
   # string-length and datetime-unit discovery (like `arr.astype()`).
   # Due to chunking, certain error reports are less clear, currently.
   if filelike:

ValueError: could not convert string '100,' to float64 at row 0, column 1.

The ValueError here gives us a good clue as to what’s going wrong: could not convert string '0.001,' to float64 at row 0, column 1. This tells us that numpy is including the comma with each value, which we obviously do not want. This happens because the loadtxt function expects values to be separated by whitespace by default, not commas. We can change this with another keyword argument:

temperature, K = np.loadtxt('thermodynamic_data.csv', unpack=True, delimiter=',')

print(temperature)
print(K)

[100. 120. 140. 160. 180. 200. 220. 240. 260. 280. 300. 320. 340. 360.
 380. 400. 420. 440. 460. 480. 500.]
[2.38e+38 2.15e+30 3.86e+24 1.89e+20 8.42e+16 1.75e+14 1.12e+12 1.67e+10
 4.73e+08 2.24e+07 1.59e+06 1.57e+05 2.03e+04 3.30e+03 6.50e+02 1.51e+02
 4.01e+01 1.21e+01 4.02e+00 1.47e+00 5.82e-01]

By setting the delimiter to a comma ,, the loadtxt function is able to successfully parse the data and we end up with the same result as our previous example.

File I/O

Contents

File I/O#

The general case: reading in files#

Writing files#

Reading in scientific data with numpy#

Reading in scientific data with `numpy`#