Functions

Functions#

Up until this point, all the code examples that you have seen and all the code you have written has been single blocks of sequential instructions (possibly containing loops). This is perfectly fine for relatively short, simple calculations, but for more complex calculations it is often helpful to break the code up into distinct sections, where each section performs one part of the larger calculation.

One way to do this in Python is to divide your code up into functions. You have already seen and used a number of functions, such as print(), sum(), and math.sqrt(), and the general concept of functions was briefly introduced earlier. Now let us look at functions in more detail to explain how they work and how you can write your own functions.

Using Functions to Structure Your Code#

Below is an example piece of code for calculating the pH and percent dissociation of a weak acid solution given its concentration and dissociation constant (\(K_\mathrm{a}\)).

import math

# Initial values
concentration = 0.1  # Initial concentration of the weak acid in mol/L
Ka = 1.8e-5  # Acid dissociation constant

# Calculate the hydrogen ion concentration
h_plus = (-Ka + math.sqrt(Ka**2 + 4*Ka*concentration)) / 2

# Calculate the pH
pH = -math.log10(h_plus)

# Calculate the percent dissociation
percent_dissociation = (h_plus / concentration) * 100

print(f"The pH of the solution is: {pH:.2f}")
print(f"The percent dissociation is: {percent_dissociation:.2f}%")

The pH of the solution is: 2.88
The percent dissociation is: 1.33%

Derivation of the formula for the equilibrium concentration of hydrogen ions

In this code example, we calculate the concentration of hydrogen ions using

h_plus = (-Ka + math.sqrt(Ka**2 + 4*Ka*concentration)) / 2

The dissociation of a weak acid, \(\mathrm{HA}\), can be expressed as the equilibrium reaction

\[\mathrm{HA} \rightleftharpoons \mathrm{H}^+ + \mathrm{A}^-\]

with the position of equilibrium given by the acid dissociation constant, \(K_\mathrm{a}\), as

\[K_\mathrm{a} = \frac{[\mathrm{H}^+][\mathrm{A}^-]}{[\mathrm{HA}]}.\]

If we start with an initial concentration of \(\mathrm{HA}\) equal to \(C\), at equilibrium this concentration has decreased by some amount \(x\), giving an equilibrium concentration

\[[\mathrm{HA}] = C-x.\]

The total amount of material in our system is fixed, and every mole of \(\mathrm{HA}\) that dissociates produces exactly one mole of both \(\mathrm{H}^+\) ions (and exactly one mole of \(\mathrm{A}^-\) ions). So

\[[\mathrm{HA}] + [\mathrm{H}^+] = C\]

and

\[[\mathrm{H}^+] = [\mathrm{A}^-]\]

under all conditions (even if we are not at equilibrium).

Since, at equilibrium, \([\mathrm{HA}] = C-x\), the equilibrium concentrations of \(\mathrm{H}^+\) and \(\mathrm{A}^-\) are given by

\[[\mathrm{H}^+] = [\mathrm{A}^-] = x\]

And we can express \(K_\mathrm{a}\) in terms of \(C\) and \(x\) as

\[K_\mathrm{a} = \frac{x^2}{C-x}\]

This can be rearranged as

\[x^2 + K_\mathrm{a}x - K_\mathrm{a}C = 0.\]

This is the standard form of a quadratic equation, and the roots (values of \(x\) that satisfy this equation) are given by the quadratic formula:

\[ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \]

with coefficients \(a=1\), \(b=K_\mathrm{a}\), and \(c=-K_\mathrm{a}C\).

i.e.,

\[ x = \frac{-K_\mathrm{a} \pm \sqrt{K_\mathrm{a}^2 + 4 K_\mathrm{a}C}}{2} \]

Since \(K_a\) and \(C\) are both positive, \(\sqrt{K_\mathrm{a}^2 + 4 K_\mathrm{a}C}\) is greater than \(K_\mathrm{a}\), and we will get one positive and one negative root. Only the positive root has physical meaning for our problem (\(x\) cannot be less than zero), giving us the following expression for the concentration of hydrogen ions:

\[ x = \frac{-K_\mathrm{a} + \sqrt{K_\mathrm{a}^2 + 4 K_\mathrm{a}C}}{2} \]

which we implement as

h_plus = (-Ka + math.sqrt(Ka**2 + 4*Ka*concentration)) / 2

We can see that the full calculation is comprised of three steps:

Use the \(K_\mathrm{a}\) value and concentration to calculate the hydrogen ion concentration
Use the hydrogen ion concentration from Step 1. to calculate the pH.
Use the concentration and the hydrogen ion concentration from Step 1. to calculate the percent dissociation.

An alternative way to structure your code to perform this calculation is to write functions for each of the three calculation steps, and then have a main block of code that calls each of these functions in turn.

To start, let us pretend that these functions already exits, just as print() and sum() exist as part of the Python core language. In this case, our code might look like:

# Initial values
concentration = 0.1  # Initial concentration of the weak acid in mol/L
Ka = 1.8e-5  # Acid dissociation constant

# Calculate the hydrogen ion concentration
h_plus = calculate_h_plus(concentration, Ka)

# Calculate the pH
pH = calculate_pH(h_plus)

# Calculate the percent dissociation
percent_dissociation = calculate_percent_dissociation(h_plus, concentration)

# Print results
print(f"The pH of the solution is: {pH:.2f}")
print(f"The percent dissociation is: {percent_dissociation:.2f}%")

The pH of the solution is: 2.88
The percent dissociation is: 1.33%

Now we are using three functions that perform the three steps of our calculation:

calculate_h_plus() calculates the hydrogen ion concentration from the concentration and \(K_\mathrm{a}\) values.
calculate_pH() calculates the pH from the hydrogen ion concentration.
calculate_percent_dissociation() calculates the dissociation percentage.

The logical structure of our code now more closely follows the logical structure of our problem, and (hopefully) makes it easier to follow the flow of information through the code and to understand what each part is doing.

For each function we provide one or more inputs that are used in the calculation, and then we store the result in a variable within our main block of code, so that we can use this result in later calculations or to print our final results.

The inputs to a function are called arguments, and we would describe these as being passed to the function. The result that comes back is called the return value. Finally, running a function (by referring to the function name, followed by brackets ()) is called calling a function.

Python does not actually come with pre-written functions for calculating pH, so, in practice, we would need to write these ourselves. Let us look at an example of what the code for these functions might look like:

import math

def calculate_h_plus(concentration, Ka):
    """Calculate the hydrogen ion concentration of a weak acid solution."""
    return (-Ka + math.sqrt(Ka**2 + 4*Ka*concentration)) / 2

def calculate_pH(h_plus):
    """Calculate the pH from the hydrogen ion concentration."""
    return -math.log10(h_plus)

def calculate_percent_dissociation(h_plus, concentration):
    """Calculate the percent dissociation of the weak acid."""
    return (h_plus / concentration) * 100

Each function is defined by a block of code with the following structure:

def function_name(parameter1, parameter2, ...):
    """Optional docstring describing the function."""
    # Function body
    # Code to perform the desired operation
    return result  # Optional return statement

Let’s break this down:

def: This keyword tells Python that you’re defining a function.
function_name: This is the name you give to your function. It is good practice to choose a descriptive name that indicates what the function does.
(parameter1, parameter2, …): These are the input parameters (or arguments) that the function accepts. Functions can have zero or more input parameters.
: The colon marks the end of the function header.
"""Docstring""": An optional (but recommended) string that describes what the function does. This helps other programmers (including your future self) understand the purpose of the function.
Function body: This is where you write the code that performs the desired operation.
return: This keyword is used to specify the value that the function should return when it is called. If omitted, the function will return None by default.

Now, let’s examine one of the specific functions from our weak acid calculation example:

def calculate_h_plus(concentration, Ka):
    """Calculate the hydrogen ion concentration of a weak acid solution."""
    return (-Ka + math.sqrt(Ka**2 + 4*Ka*concentration)) / 2

This function takes two parameters: concentration and Ka. It has a docstring describing its purpose. It performs a calculation and immediately returns the result.

If you look at the other two functions, you can see the same general structure, and how the inputs are used to calculate the desired result.

These functions demonstrate several key concepts that are generally applicable to writing Python code:

Each function has a specific, focused purpose.
They accept input parameters that are used in their calculations.
They all include descriptive docstrings.
Each function performs a calculation and returns the result.
The functions are reusable - they can be called multiple times with different input values.

By structuring our code this way, we’ve made it more modular, easier to understand, and simpler to maintain. For example, if we wanted to change how we calculate the hydrogen ion concentration, we only need to update the calculate_h_plus() function, without touching the rest of the code.

Keyword, Optional, and Default arguments#

The example above uses the quadratic formula to calculate the equilibrium concentration of hydrogen ions from the initial concentration of acid and the acid dissociation constant.

Solving quadratic equations is useful for many more applications than just this example. And, so, we might decide that we would like to write a generic quadratic_roots function that can find the roots for any general quadratic equation

\[ax^2 + bx + c = 0\]

and then reuse this function across several problems.

import math

def quadratic_roots(a, b, c):
    d = b**2 - 4*a*c
    r1 = (-b + math.sqrt(d)) / 2 / a
    r2 = (-b - math.sqrt(d)) / 2 / a
    return r1, r2

For the case where \(a=1\), \(b=3\), and \(c=-10\), we could use our function as follows:

print(quadratic_roots(1, 3, -10))

(2.0, -5.0)

In this example, we have assigned values to the arguments of a function based on the order of these arguments in the function call.

quadratic_roots(1, 3, -10)

calls the quadratic_roots function and assigns a=1, b=3, c=-10, because the arguments appear in this order in the function definition above.

We can also assign values to function arguments explicitly using keyword arguments.

quadratic_roots(b=3, c=-10, a=1)

(2.0, -5.0)

We can mix positional and keyword arguments in a function call, but the positional arguments must come first:

quadratic_roots(1, c=-10, b=3)

(2.0, -5.0)

quadratic_roots(a=1, -10, b=3)

  Cell In[9], line 1
    quadratic_roots(a=1, -10, b=3)
                                 ^
SyntaxError: positional argument follows keyword argument

Similarly, calling a function using a keyword argument that is not included in the function definition also raises an error:

quadratic_roots(a=1, b=3, f=-10)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 quadratic_roots(a=1, b=3, f=-10)

TypeError: quadratic_roots() got an unexpected keyword argument 'f'

Functions can also be defined with optional arguments by giving these arguments default values in the the function definition.

def ch40208_greet(name, day="Tuesday"):
    print(f"Hello {name}, and welcome to the {day} lesson.")

In this case we have one required argument, and one optional argument. If we call the function and pass one value, this has the argument variable name assigned to it, while the argument variable day is assigned to the default value of "Tuesday".

ch40208_greet("Ben")

Hello Ben, and welcome to the Tuesday lesson.

What if it is not Tuesday though? Then we can call this function with two input parameters and specify the day:

ch40208_greet("Ben", "Friday")

Hello Ben, and welcome to the Friday lesson.

Note that the sequence of values passed into the function maps onto the sequence of argument variables assigned to these values.

ch40208_greet("Friday", "Ben")

Hello Friday, and welcome to the Ben lesson.

While required arguments must be passed in the same order as the argument list in the function definition, optional arguments passed with keywords can appear in any order (although they must appear after all the required arguments.

def ch40208_greet_and_info(name, day="Tuesday", topic="Topics in Computational Chemistry"):
    print(f"Hello {name}. Welcome to the {day} lesson on {topic}.")

ch40208_greet_and_info("Lucy")

Hello Lucy. Welcome to the Tuesday lesson on Topics in Computational Chemistry.

ch40208_greet_and_info("Lucy", "Friday")

Hello Lucy. Welcome to the Friday lesson on Topics in Computational Chemistry.

ch40208_greet_and_info("Lucy", "Friday", "functions")

Hello Lucy. Welcome to the Friday lesson on functions.

ch40208_greet_and_info("Lucy", topic="functions (again)")

Hello Lucy. Welcome to the Tuesday lesson on functions (again).

In this last example, we pass two parameters, but the second parameter is assigned the keyword topic. When the ch40208_greet_and_info() function is called, the first parameter is assigned to the required name argument, and the second parameter is assigned to the topic argument, corresponding to the keywork. We do not provide a value for the day optional argument, so the function runs with day assigned to the default value of "Monday".

Scope#

The term scope refers to where a variable is visible in a Python program. Variables can be visible in one part of a program, but not in another part.

def add(a, b):
    return a + b

print(add(3, 5))

print(a)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[33], line 6
      2     return a + b
      4 print(add(3, 5))
----> 6 print(a)

NameError: name 'a' is not defined

In this example, the variables a and b are only defined inside the function add. If we try to refer to either of these variable names outside the function, the we get a NameError saying that these names have not been defined.

To describe this behaviour in terms of scope, we would say that a and b are defined in the local scope of the function add. Outside this scope, these variables are not defined. One consequence of this is we can write modular code where the same variable names get reused in different functions, while each function remains self-contained.

def add(a, b):
    return a + b

def multiply(a, b):
    return a * b

print(add(2,3))
print(multiply(2,3))

5
6

If we define variables outside of functions then they are defined in global scope. Variables defined in global scope are also visible in the local scope of any functions defined further down in the same piece of code.

x, y = 1, 3

def add_to_x(y):
    return x + y

print(add_to_x(5))

In this example, we define two variables in global scope: x and y. We then define another variable y in the local scope of the function add_to_x, which is set to the value 5 when we call the function. Because x is not defined in the local scope of the function, the Python interpreter looks to see if there is a variable x defined in the outer scope. If there is, it assumes we are referring to that variable.

We can make things a bit clearer by adding some print() statements:

x, y = 1, 3

print(f'outside the function: x = {x}, y = {y}')

def add_to_x(y):
    print(f'inside the function: x = {x}, y = {y}')
    return x + y

print(add_to_x(5))

print(f'outside the function: x = {x}, y = {y}')

outside the function: x = 1, y = 3
inside the function: x = 1, y = 5
6
outside the function: x = 1, y = 3

Notice how when we return outside the function, we are back in the global scope, and the name y now refers to the value 3 defined in the first line of the code.

Mixing global and local scope can get quite confusing. In fact, it is usually good practice to only refer to local variables inside functions, unless it is very clear from the code what they refer to. One of the benefits of dividing code up into functions is that it makes your code modular, and individual functions can be reused across different bits of code. If the behaviour of a function depends on the value of a global variable, however, it can become very difficult to correctly reason about the behaviour of that function, giving a prime opportunity for hard-to-squash bugs in your code.

An example of where you might want to use a global variable inside a function might be to have a global variable for a physical constant, such as the Boltzmann constant.

k_boltz = 1.380649e-23 # Boltzmann constant in J K^-1

def boltzmann_factor(energy, temperature):
    return math.exp(-energy/(k_boltz * temperature))

Python also has a set of built-in functions and variables that are always available, like print() and len(). These live in built-in scope. The Python interpreter checks built-in scope after it has checked local scope and global scope. This means that if you define a new variable with the same name as a built-in function you might get unexpected behaviour and a particularly hard-to-spot bug.

Local scope does not only apply to functions; it also applied to list comprehensions:

x = 3
print([x+1 for x in (6,7,8)])
print(x) # refers to x in global scope

[7, 8, 9]
3

Docstrings#

Computer code is rarely used once, when it is written, and then left. More often code is used over weeks, or months, or years, possibly with pieces of code being used in multiple projects or by multiple users. Code that solves your specific problem might also be easier to adapt from pre-existing code (either your own or someone elses), than to write from scratch each time.

While we might all think that it is obvious what our code does and how it works at the time we write it, there is no guarantee that someone else will think the same thing. Or that if you want to reuse or modify a piece of code in six months that it will make any sense.

Having to figure out exactly what a piece of code does by reading the source code is, at best, time consuming; at worst it becomes impossible.

For these reasons, it is strongly recommended that you document your code, to give enough information to a future user of the code (most likely future-you) to be able to pick it up and use it correctly.

Functions support a particular kind of inline documentation called a docstring. A docstring is a string that provides documentation about a function: usually a summary of what the function does, and information about any required or optional arguments, and any data returned by the function.

A docstring is written as a multi-line string, which is defined in Python by a block of text that starts and ends with triple quotes: """ or '''.

For example, here is a function kinetic_energy() that calculates the kinetic energy of a particle. The function takes two required arguments mass and velocity and returns the kinetic energy, calculated as \(E_\mathrm{KE} = \frac{1}{2}mv^2\).

def kinetic_energy(mass, velocity):
    """
    Determine the kinetic energy of a particle.
    
    Args: 
        mass (float): Particle mass (kg)
        velocity (float): Particle velocity (m/s)
    
    Returns:
        (float): Particle kinetic energy (J)
    """
    kinetic_energy = 0.5 * mass * velocity ** 2
    return kinetic_energy

In this example the docstring is:

    """
    Determine the kinetic energy of a particle.
    
    Args: 
        mass (float): Particle mass (kg)
        velocity (float): Particle velocity (m/s)
    
    Returns:
        (float): Particle kinetic energy (J)
    """

The first line gives a brief summary of what the function does.

Below this there is an Args: section, with each argument listed on a separate line underneath.

At the end there is a Returns: section, which gives the type of the returned value, and an explanation of what this is.

A docstring can contain any text in any format. There are a few common conventions used in most Python code. In this example we have used Google Style, but other dosctring styles do exist.

Once a function has been defined, or imported, the docstring is stored as a special variable __doc__, which allows us to read this information.

print(kinetic_energy.__doc__)

We can do the same thing for functions that are part of the standard library or that we have imported from external modules:

print(sum.__doc__)

from math import sqrt
print(sqrt.__doc__)

If we are working inside a Jupyter notebook we can use a special ? syntax to open the docstring for a function in a separate popout window.

kinetic_energy?

Exercises:#

Exercise 1#

The correct definition of pH is

\[\mathrm{pH} = -\log_{10}(a_\mathrm{H}^+),\]

where \(a_\mathrm{H}^+\) is the activity of hydrogen ions, which is related to the concentration \([\mathrm{H}^+]\) by the activity coefficient, \(\gamma_\mathrm{H}^+\):

\[a_\mathrm{H}^+ = \gamma_\mathrm{H}^+[\mathrm{H}^+].\]

Write an updated version of the calculate_pH() function from the earlier example so that it calculates pH correctly. Your function should take two arguments: the concentration of hydrogen ions and their activity coefficient. Check that for \(\gamma_\mathrm{H}^+ = 1\) you get the same answer as before. Now rewrite your function so that the activity coefficient is an optional argument, with a default value of \(\gamma_\mathrm{H}^+ = 1\). Check that if you call this function in your code as calculate_pH(h_plus) you still get the same result.

Exercise 2#

Rewrite your calculate_h_plus() function to call the generic quadratic_roots() function. You will need to pass in appropriate values for the arguments a, b, and c. Remember that the quadratic_roots() function returns both roots: you want the positive root.

Exercise 3#

Write a function to calculate the distance between two atoms. Then rewrite the exercise in the loops section to utilise this function. Make sure to include a docstring describing the purpose of the function and outlining the arguments.

Exercise 4#

The Einstein model for the heat capacity of a crystal is

\[C_{V,m} = 3R\left(\frac{\Theta_E}{T}\right)^2\frac{\exp\left(\frac{\Theta_\mathrm{E}}{T}\right)}{\left[\exp\left(\frac{\Theta_\mathrm{E}}{T}\right)-1\right]^2}\]

where the Einstein temperature, \(\Theta_\mathrm{E}\) is a constant for a given material.

Write a function to calculate \(C_{V,m}\) at 300 K for (a) sodium (\(\Theta_\mathrm{E}\) = 192 K) and (b) diamond (\(\Theta_\mathrm{E}\) = 1450 K).