Lists#

Up until now we have dealt with individual pieces of data, such integer or floating point data types to store numerical data, or strings to represent textual data.

Often we want to work with collections of data, such as a time-series of concentrations from an experiment, or a set of yields obtained from different experimental protocols, or the list of students enrolled in CH40208 Topics in Computational Chemistry.

A list is an ordered collection of values. The values that make up a list are called elements or items, so we might talk of a list that contains 4 items, or the 2nd element in a list.

Creating lists#

A list in Python is written as a sequence of values separated by commas and enclosed by square brackets.

[1, 1, 2, 3, 5, 8, 13]
[1, 1, 2, 3, 5, 8, 13]

We can assign a list to a variable the same way as with single-value datatypes:

fibonnaci = [1, 1, 3, 4, 5, 8, 13]
print(fibonnaci)
[1, 1, 3, 4, 5, 8, 13]
type(fibonnaci)
list

A list can contain any datatype:

nobel_gases = ['helium', 'Neon', 'Argon']
print(nobel_gases)
['helium', 'Neon', 'Argon']
type(nobel_gases)
list

Even a mixture of different datatypes:

miscellaneous = [42, "mushroom", 3.145, 2+3j]
print(miscellaneous)
[42, 'mushroom', 3.145, (2+3j)]
type(miscellaneous)
list

Or other lists:

inner_list = ['a', 'b', 'c']
outer_list = ['e', inner_list, 'f']
print(outer_list)
['e', ['a', 'b', 'c'], 'f']

This last example would be described as a nested list (a list inside another list).

List indexing#

Individual elements of a list can be referred to by using list indexing. For example, the first item in our nobel_gases list can be accessed using

nobel_gases[0]
'helium'

The second item using:

nobel_gases[1]
'Neon'

And the third item using:

nobel_gases[2]
'Argon'

Note that to reference the 1st element we use [0] after the list variable name, to reference the 2nd element we use [1], and to reference the 3rd element we use [2]. This convention is because the list index describes the number of entries from the start of the list. i.e. the first entry is zero entries from the start.

Trying to access a list element that does not exist raises an IndexError:

nobel_gases[3]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[15], line 1
----> 1 nobel_gases[3]

IndexError: list index out of range

We can also refer to individual elements counting backwards from the end of the list, using negative indexing:

nobel_gases[-1]
'Argon'
nobel_gases[-2]
'Neon'
nobel_gases[-3]
'helium'

Using positive and negative indexing to refer to individual elements of the nobel_gases list.

Modifying lists#

An important feature of lists is that they are mutable—values in the list can be changed; new elements can be added; existing elements can be deleted.

You might have noticed that the first of our noble gases is "helium", which is missing a capital letter. We can fix that by assigning a new value to nobel_gases[0]

nobel_gases[0] = "Helium"
print(nobel_gases)
['Helium', 'Neon', 'Argon']

Adding elements to lists#

Lists have a number of special functions (called methods) that we can use to add elements to a pre-existing list.

A method is somewhat similar to a function, except it is associated with a particular object. Using a method lets us implicitly refer to the associated object.

Calling a method of a list implicitly refers to that list. For example, to append another nobel gas string to our nobel_gases list we can use the append() method:

nobel_gases = ['Helium', 'Neon', 'Argon']
nobel_gases.append('Krypton')
print(nobel_gases)
['Helium', 'Neon', 'Argon', 'Krypton']

This is equivalent to a function that takes two arguments: the list we want to append to, and the object to be appended. For example, we might imagine an append_to_list() function that works as follows (note that this append_to_list() function is purely imagined and doesn’t exist in native Python):

nobel_gases = ['Helium', 'Neon', 'Argon']
append_to_list(nobel_gases, 'Krypton')
print(nobel_gases)
['Helium', 'Neon', 'Argon', 'Krypton']

Because the append() method belongs to the nobel_gases list, when we use nobel_gases.append() we don’t need to specify the list that we are appending to.

Another list method that can be used to add elements is insert(), which inserts an object at a given index:

nobel_gases = ['Helium', 'Neon', 'Krypton']
nobel_gases.insert(2, 'Argon')
print(nobel_gases)
['Helium', 'Neon', 'Argon', 'Krypton']

The insert() methods takes two arguments: the index where the insertion will take place (in this case 2), and the object to insert (in this case "Argon").

You can also extend a list with another list, by using the extend() method:

nobel_gases.extend(['Xenon', 'Radon'])
print(nobel_gases)
['Helium', 'Neon', 'Argon', 'Krypton', 'Xenon', 'Radon']

Or you can concatenate two lists using the + operator (this is similar to “adding” strings):

nobel_gases = ['Helium', 'Neon', 'Argon']
more_nobel_gases = ['Krypton', 'Xenon', 'Radon']
print(nobel_gases + more_nobel_gases)
['Helium', 'Neon', 'Argon', 'Krypton', 'Xenon', 'Radon']

Deleting elements from lists#

Lists also have methods for deleting or removing elements. These include remove() which removes the first matching object in a list.

nobel_gases = ['Helium', 'Neon', 'Argon']
nobel_gases.remove('Neon')
print(nobel_gases)
['Helium', 'Argon']

List slices#

By referring to a single list index, we can access a single item of a list. Python also allows us to use slice notation to access a range of items from a list, e.g.

nobel_gases = ['Helium', 'Neon', 'Argon', 'Krypton', 'Xenon', 'Radon']
print(nobel_gases[1:4])
['Neon', 'Argon', 'Krypton']

This gives us just those elements of a list starting from nobel_gases[1] up to but not including nobel_gases[4].

This behaviour, where the first index is included in the slice, but the last index is not included might seem strange at first (you will get used to this with practice and experience). The reasoning behind this behaviour is that a slice nobel_gases[1:4] should contain 4-1=3 elements, and should start from the same element as (in this case) nobel_gases[1].

Either the first or last index can be left out, in which case you get a slice starting from the first element, or finishing with the last element, respectively.

nobel_gases = ['Helium', 'Neon', 'Argon', 'Krypton', 'Xenon', 'Radon']
print(nobel_gases[:3])
['Helium', 'Neon', 'Argon']
nobel_gases = ['Helium', 'Neon', 'Argon', 'Krypton', 'Xenon', 'Radon']
print(nobel_gases[3:])
['Krypton', 'Xenon', 'Radon']

List slices can also be given a third integer, which describes a step size. This is implicitly equal to 1 when it is not specified

nobel_gases = ['Helium', 'Neon', 'Argon', 'Krypton', 'Xenon', 'Radon']
print(nobel_gases[::2])
['Helium', 'Argon', 'Xenon']

In this example. we start from the first element, and go up to the last element, in steps of 2, giving elements 0, 2, and 4.

In addition to the remove() method above, list slicing gives us another way to “delete” elements from a list:

nobel_gases = ['Helium', 'Neon', 'Argon', 'Krypton', 'Xenon', 'Radon']
nobel_gases = nobel_gases[1:3]
print(nobel_gases)
['Neon', 'Argon']

In the second line here we select elements 1 and 2 from the original nobel_gases list, using slice notation, which gives us ['Neon', 'Argon']. We then assign this value to the variable name nobel_gases, which overwrites the original list.

Working with nested lists#

We saw above that a list can contain one or more other lists, giving a nested list. How then do these work with indexing?

Consider the following nested list:

letters = ['a', 'b', ['c', 'd'], 'e', 'f']
print(letters)
['a', 'b', ['c', 'd'], 'e', 'f']

The list letters has five elements, these are:

  • 'a'

  • 'b'

  • the list ['c', 'd']

  • 'e'

  • 'f'

We can confirm this using the len() function. If we pass len() a list, it returns the number of elements in that list (i.e. the length).

len(letters)
5

The third element is the sublist ['c', 'd'], which we can access using letters[2]

print(letters[2])
['c', 'd']

letters[2] is a list

type(letters[2])
list

and we can interact with this in the same way as any other list, e.g.

letters[2][1]
'd'
letters[2].append('z')
print(letters)
['a', 'b', ['c', 'd', 'z'], 'e', 'f']

Summing over lists using sum() (or .join())#

Python provides a sum() function that can be used to “sum” or add-together all the elements of a list.

e.g.

list_of_numbers = [1, 2, 3, 4, 5]
sum(list_of_numbers)
15

This is the same as adding all the elements explicitly:

list_of_numbers[0] + list_of_numbers[1] + list_of_numbers[2] + list_of_numbers[3] + list_of_numbers[4]
15

The sum() function only works for lists of numbers, so can’t be used to (for example) &lqduo;add” a list of strings. Trying to use sum() to add a list of strings will raise a TypeError:

message = ["This", "will", "add", "everything", "together"]
sum(message)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-3a9c9dc32b9f> in <module>
      1 message = ["This", "will", "add", "everything", "together"]
----> 2 sum(message)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

To “add” lists of strings we can use the join() method:

" ".join(message)
'This will add everything together'

Note that join() is a string method, where the string defines what to insert between the elements of message; in this case we have used a string containing one space, but this could be anything:

"+".join(message)
'This+will+add+everything+together'

Exercises:#

  • Create a new list called halogens and populate it with the names of the halogen elements, in order of increasing atomic number. Using this list, determine what using a negative number following the second colon (e.g. [::-1]) will result in.

  • The distance (\(r\)) between two atoms (\(i\) and \(j\)) can be found with the following equation,

    \[ r = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2 + (z_i - z_j)^2}, \]

    where, \(x_i\) and \(x_j\) are the x-coordinates for the atoms \(i\) and \(j\) respectively, while \(y_i\) and \(y_j\) are the y-coordinates and \(z_i\) and \(z_j\) are the z-coordinates. Below, (as lists) are given the atomic coordinates to two triatomic molecules, where for each atom the list is atom = [x_position, y_position, z_position]. For each molecule, calculate the intramolecular distances between all thre atoms.
    Having calculated all of the intramolecular distances, comment on the shape of each of the molecules.

Molecule 1#

atom_1 = [0.1, 0.5, 3.2]
atom_2 = [0.4, 0.5, 2.3]
atom_3 = [-0.3, 0.3, 1.7]

Molecule 2#

atom_1 = [-0.1, 0.5, 1.5]
atom_2 = [0.2, 0.5, 2.6]
atom_3 = [0.5, 0.5, 3.7]