Worked Examples: NumPy Exercises

Worked Examples: NumPy Exercises#

These worked solutions correspond to the exercises on the NumPy Exercises page.

How to use this notebook:

Try each exercise yourself first before looking at the solution
The code cells show both the code and its output
Download this notebook if you want to run and experiment with the code yourself
Your solution might look different - that’s fine as long as it gives the correct answer!

Setup#

We’ll import NumPy here for use throughout the exercises.

import numpy as np

Exercise 1: Average Atomic Mass#

Problem: Calculate the average (mean) atomic mass of naturally occurring tin using the mass numbers and natural abundances of its isotopes.

The average atomic mass is calculated as:

\[\text{Average mass} = \sum (\text{mass number} \times \text{abundance})\]

Solution:

# Mass numbers for tin isotopes
mass_numbers = np.array([112, 114, 115, 116, 117, 118, 119, 120, 122, 124])

# Natural abundances (as decimal fractions)
abundances = np.array([0.0097, 0.0066, 0.0034, 0.1454, 0.0768, 
                       0.2422, 0.0859, 0.3258, 0.0463, 0.0579])

# Let's verify our abundances sum to 1 (or very close)
print(f"Sum of abundances: {np.sum(abundances)}")

Sum of abundances: 0.9999999999999999

# Calculate the weighted average using vector arithmetic
average_mass = np.sum(mass_numbers * abundances)

print(f"Average atomic mass of tin: {average_mass:.2f}")

Average atomic mass of tin: 118.81

Explanation:

We use NumPy arrays to store the mass numbers and abundances
The expression mass_numbers * abundances performs element-wise multiplication
This gives us an array where each element is mass × abundance for each isotope
np.sum() adds all these weighted masses together
The result (~118.71) matches the standard atomic weight of tin

Key concept: This is a weighted average calculation - we multiply each value by its weight (abundance) and sum the results.

Alternative: Using np.average() with weights

average_mass_alt = np.average(mass_numbers, weights=abundances)

print(f"Using np.average(): {average_mass_alt:.2f}")

Using np.average(): 118.81

NumPy provides np.average() which can directly calculate weighted averages. The weights parameter allows us to specify the abundances as weights. This is more concise and clearly shows our intent to calculate a weighted average.

Both methods give the same result, but np.average() is more explicit about what we’re calculating.

Exercise 2: Molecular Distances#

Problem: Calculate all pairwise distances between atoms in an ammonia molecule (NH₃) using NumPy arrays and vector arithmetic.

The distance between two atoms is given by:

\[d = \sqrt{(x_2-x_1)^2 + (y_2-y_1)^2 + (z_2-z_1)^2}\]

Given atomic coordinates (in Ångströms):

Nitrogen: (0.0, 0.0, 0.0)
Hydrogen 1: (0.0, 0.94, 0.38)
Hydrogen 2: (0.81, -0.47, 0.38)
Hydrogen 3: (-0.81, -0.47, 0.38)

Task 1: Store positions as a 2D NumPy array#

# Create a 2D array with shape (4, 3) - 4 atoms, 3 coordinates each
atoms = np.array([[0.0, 0.0, 0.0],       # N
                  [0.0, 0.94, 0.38],     # H1
                  [0.81, -0.47, 0.38],   # H2
                  [-0.81, -0.47, 0.38]]) # H3

atom_labels = ['N', 'H1', 'H2', 'H3']

print(f"Shape of atoms array: {atoms.shape}")
print(f"Number of atoms: {atoms.shape[0]}")
print(f"Number of coordinates per atom: {atoms.shape[1]}")

Shape of atoms array: (4, 3)
Number of atoms: 4
Number of coordinates per atom: 3

Tasks 2 & 3: Calculate pairwise distances using loops and vector arithmetic#

# Calculate all unique pairwise distances
print("Pairwise distances in NH₃:")
print("-" * 26)

for i in range(len(atoms)):
    for j in range(i + 1, len(atoms)):
        # Vector between atoms i and j
        vector = atoms[j] - atoms[i]
        
        # Calculate distance using np.linalg.norm()
        distance = np.linalg.norm(vector)
        
        print(f"{atom_labels[i]:2s} - {atom_labels[j]:2s}: {distance:.4f} Å")

Pairwise distances in NH₃:
--------------------------
N  - H1: 1.0139 Å
N  - H2: 1.0106 Å
N  - H3: 1.0106 Å
H1 - H2: 1.6261 Å
H1 - H3: 1.6261 Å
H2 - H3: 1.6200 Å

Explanation:

atoms[j] - atoms[i] performs element-wise subtraction to get the vector between two atoms
np.linalg.norm(vector) calculates the Euclidean norm (magnitude) of the vector
This is equivalent to np.sqrt(np.sum(vector**2)) but more concise
The nested loops with j starting from i + 1 ensures we only calculate each distance once

# Let's verify one distance manually to understand what np.linalg.norm() does
# Distance between N (atom 0) and H1 (atom 1)

# Extract individual atoms
atom_N = atoms[0]
atom_H1 = atoms[1]
print(f"Position of atom_N: {atom_N}")
print(f"Position of atom_H1: {atom_H1}")

# Calculate step by step
diff = atom_H1 - atom_N
print(f"Difference vector: {diff}")

squared = diff**2
print(f"Squared components: {squared}")

sum_squared = np.sum(squared)
print(f"Sum of squares: {sum_squared:.4f}")

distance_manual = np.sqrt(sum_squared)
print(f"Distance (manual): {distance_manual:.4f} Å")

distance_norm = np.linalg.norm(atom_H1 - atom_N)
print(f"Distance (np.linalg.norm): {distance_norm:.4f} Å")

Position of atom_N: [0. 0. 0.]
Position of atom_H1: [0.   0.94 0.38]
Difference vector: [0.   0.94 0.38]
Squared components: [0.     0.8836 0.1444]
Sum of squares: 1.0280
Distance (manual): 1.0139 Å
Distance (np.linalg.norm): 1.0139 Å

Alternative approach: Store all distances in a matrix

An alternative approach is to calculate all distances and store these in a distance matrix:

# Create a distance matrix
n_atoms = len(atoms)
distance_matrix = np.zeros((n_atoms, n_atoms)) # initialise matrix of zeroes

for i in range(n_atoms):
    for j in range(i+1, n_atoms): # only loop over unique atom pairs.
        dr = np.linalg.norm(atoms[i] - atoms[j]) # calculate the distance dr_ij once
        distance_matrix[i, j] = dr # store dr_ij
        distance_matrix[j, i] = dr # store dr_ji (which is equal to dr_ij)

# Display the matrix with labels
print("Distance matrix (Å):")
print("     ", "  ".join(f"{label:5s}" for label in atom_labels))
for i, label in enumerate(atom_labels):
    print(f"{label:3s}: ", " ".join(f"{d:5.3f}" for d in distance_matrix[i]))

Distance matrix (Å):
      N      H1     H2     H3   
N  :  0.000 1.014 1.011 1.011
H1 :  1.014 0.000 1.626 1.626
H2 :  1.011 1.626 0.000 1.620
H3 :  1.011 1.626 1.620 0.000

Distance matrix interpretation:

The diagonal elements are all 0 (distance from an atom to itself) — we do not need to explicitly calculate these.
The matrix is symmetric (distance from A to B equals distance from B to A) - we only calculate this once
This format is useful for further analysis, such as finding the closest or furthest atom pairs, e.g.:

Comparison with previous exercises: In the Loops exercises, you calculated these same distances using lists and manual indexing. Using NumPy arrays makes the vector arithmetic more natural and the code more readable.

Exercise 3: Analysis of Experimental Data#

Problem: Analyze absorbance measurements at 540 nm over time to monitor a reaction.

Given measurements (arbitrary units):

Times: 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 minutes
Absorbance: 1.85, 1.52, 1.31, 1.08, 0.89, 0.71, 0.58, 0.49, 0.38, 0.32, 0.28

Task 1: Convert data to NumPy arrays#

# Original data as lists
times = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]  # minutes
absorbance = [1.85, 1.52, 1.31, 1.08, 0.89, 0.71, 0.58, 0.49, 0.38, 0.32, 0.28]

# Convert to NumPy arrays
times = np.array(times)
absorbance = np.array(absorbance)

print(f"Times array shape: {times.shape}")
print(f"Absorbance array shape: {absorbance.shape}")

Times array shape: (11,)
Absorbance array shape: (11,)

Task 2: Calculate reaction progress#

First, let’s calculate the relative absorbance at each time point (as a fraction of the initial value).

# Calculate relative absorbance (fraction of initial value)
initial_absorbance = absorbance[0]
relative_absorbance = absorbance / initial_absorbance

# Display reaction progress at selected time points
print("\nReaction progress:")
print("Time (min) | Absorbance | Relative absorbance")
print("-" * 45)
for i in range(0, len(times)):
    print(f"{times[i]:10d} | {absorbance[i]:10.2f} | {relative_absorbance[i]:6.3f}")

Reaction progress:
Time (min) | Absorbance | Relative absorbance
---------------------------------------------
         0 |       1.85 |  1.000
         2 |       1.52 |  0.822
         4 |       1.31 |  0.708
         6 |       1.08 |  0.584
         8 |       0.89 |  0.481
        10 |       0.71 |  0.384
        12 |       0.58 |  0.314
        14 |       0.49 |  0.265
        16 |       0.38 |  0.205
        18 |       0.32 |  0.173
        20 |       0.28 |  0.151

Now let’s find when the absorbance has decreased to half its initial value:

# Calculate half of initial value
half_value = initial_absorbance / 2
print(f"Half of initial absorbance: {half_value:.3f}")

# Find where absorbance drops below half
below_half = absorbance < half_value

# Get the first index where this occurs
if np.any(below_half):
    half_life_index = np.argmax(below_half)  # argmax finds first True
    half_life_time = times[half_life_index]
    print(f"\nAbsorbance drops below half its initial value at:")
    print(f"  Time: {half_life_time} minutes")
    print(f"  Actual absorbance: {absorbance[half_life_index]:.3f}")

Half of initial absorbance: 0.925

Absorbance drops below half its initial value at:
  Time: 8 minutes
  Actual absorbance: 0.890

Task 3: Calculate change between consecutive time points#

To understand the reaction rate, we need to calculate how much the absorbance changes between measurements.

# Calculate differences using array slicing
delta_absorbance = absorbance[1:] - absorbance[:-1]
delta_time = times[1:] - times[:-1]

print("Change in absorbance between consecutive measurements:\n")
print("Time interval | ΔAbsorbance")
print("-" * 27)
for i, delta_abs in enumerate(delta_absorbance): 
    print(f"t={times[i]:<2d} to t={times[i+1]:<2d}  | {delta_absorbance[i]:11.3f}")

Change in absorbance between consecutive measurements:

Time interval | ΔAbsorbance
---------------------------
t=0  to t=2   |      -0.330
t=2  to t=4   |      -0.210
t=4  to t=6   |      -0.230
t=6  to t=8   |      -0.190
t=8  to t=10  |      -0.180
t=10 to t=12  |      -0.130
t=12 to t=14  |      -0.090
t=14 to t=16  |      -0.110
t=16 to t=18  |      -0.060
t=18 to t=20  |      -0.040

Understanding the array slicing:

absorbance[1:] gives all elements from index 1 to the end
absorbance[:-1] gives all elements from start to second-last
Subtracting these aligned arrays gives differences between consecutive elements

Task 4: Find when the absorbance first drops below 0.5#

threshold = 0.5

# Find when absorbance first drops below threshold using a loop
first_below = None
for i in range(len(absorbance)):
    if absorbance[i] < threshold:
        first_below = i
        break  # Stop once we find the first one

if first_below is not None:
    print(f"Absorbance first drops below {threshold} at:")
    print(f"  Time: {times[first_below]} minutes")
    print(f"  Absorbance: {absorbance[first_below]:.2f}")
else:
    print(f"Absorbance never drops below {threshold}")

Absorbance first drops below 0.5 at:
  Time: 14 minutes
  Absorbance: 0.49

Alternative solution using NumPy boolean indexing

# Create a boolean array where True means "below threshold"
below_threshold = absorbance < threshold
print(f"Boolean array: {below_threshold}")

# If any values are below threshold...
if np.any(below_threshold):
    # argmax on a boolean array finds the first True
    first_index = np.argmax(below_threshold)
    print(f"\nUsing np.argmax(): First crossing at index {first_index}")
    print(f"  Time: {times[first_index]} minutes")
    print(f"  Absorbance: {absorbance[first_index]:.2f}")

Boolean array: [False False False False False False False  True  True  True  True]

Using np.argmax(): First crossing at index 7
  Time: 14 minutes
  Absorbance: 0.49

Explanation

While the first approach using a loop is clear and works perfectly, NumPy offers a more concise approach using boolean arrays.

When you write absorbance < threshold, NumPy doesn’t just give you a single True/False answer. Instead, it compares EACH element of the array to the threshold and creates a new array of True/False values:

True where the element is less than the threshold
False where the element is greater than or equal to the threshold

This array of True/False values is called a boolean array. Let’s see what this looks like:

# When we compare an array to a value, we get a boolean array
below_threshold = absorbance < threshold

print("Original absorbance values:")
print(absorbance)
print(f"\nComparing each value to threshold ({threshold}):")
print(below_threshold)
print(f"\nThis is a boolean array with dtype: {below_threshold.dtype}")

# We can see the correspondence:
for i in range(len(absorbance)):
    print(f"  {absorbance[i]:.2f} < {threshold} ? {below_threshold[i]}")

Original absorbance values:
[1.85 1.52 1.31 1.08 0.89 0.71 0.58 0.49 0.38 0.32 0.28]

Comparing each value to threshold (0.5):
[False False False False False False False  True  True  True  True]

This is a boolean array with dtype: bool
  1.85 < 0.5 ? False
  1.52 < 0.5 ? False
  1.31 < 0.5 ? False
  1.08 < 0.5 ? False
  0.89 < 0.5 ? False
  0.71 < 0.5 ? False
  0.58 < 0.5 ? False
  0.49 < 0.5 ? True
  0.38 < 0.5 ? True
  0.32 < 0.5 ? True
  0.28 < 0.5 ? True

Key concept: Boolean arrays

Boolean arrays are powerful tools in NumPy:

Created by comparing arrays to values (array < value, array == value, etc.)
Each element is True or False
Can be used to filter data: array[boolean_mask] returns only elements where mask is True
np.argmax() on a boolean array finds the first True (since True=1, False=0)

This is more advanced than using loops, but can be very efficient for large datasets. However, loops are perfectly fine and often clearer if you are not familiar with more advanced NumPy techniques.