Worked Examples: NumPy Exercises#
These worked solutions correspond to the exercises on the NumPy Exercises page.
How to use this notebook:
Try each exercise yourself first before looking at the solution
The code cells show both the code and its output
Download this notebook if you want to run and experiment with the code yourself
Your solution might look different - that’s fine as long as it gives the correct answer!
Setup#
We’ll import NumPy here for use throughout the exercises.
import numpy as np
Exercise 1: Average Atomic Mass#
Problem: Calculate the average (mean) atomic mass of naturally occurring tin using the mass numbers and natural abundances of its isotopes.
The average atomic mass is calculated as:
Solution:
# Mass numbers for tin isotopes
mass_numbers = np.array([112, 114, 115, 116, 117, 118, 119, 120, 122, 124])
# Natural abundances (as decimal fractions)
abundances = np.array([0.0097, 0.0066, 0.0034, 0.1454, 0.0768,
0.2422, 0.0859, 0.3258, 0.0463, 0.0579])
# Let's verify our abundances sum to 1 (or very close)
print(f"Sum of abundances: {np.sum(abundances)}")
Sum of abundances: 0.9999999999999999
# Calculate the weighted average using vector arithmetic
average_mass = np.sum(mass_numbers * abundances)
print(f"Average atomic mass of tin: {average_mass:.2f}")
Average atomic mass of tin: 118.81
Explanation:
We use NumPy arrays to store the mass numbers and abundances
The expression
mass_numbers * abundances
performs element-wise multiplicationThis gives us an array where each element is
mass × abundance
for each isotopenp.sum()
adds all these weighted masses togetherThe result (~118.71) matches the standard atomic weight of tin
Key concept: This is a weighted average calculation - we multiply each value by its weight (abundance) and sum the results.
Alternative: Using np.average()
with weights
average_mass_alt = np.average(mass_numbers, weights=abundances)
print(f"Using np.average(): {average_mass_alt:.2f}")
Using np.average(): 118.81
NumPy provides np.average()
which can directly calculate weighted averages. The weights
parameter allows us to specify the abundances as weights. This is more concise and clearly shows our intent to calculate a weighted average.
Both methods give the same result, but np.average()
is more explicit about what we’re calculating.
Exercise 2: Molecular Distances#
Problem: Calculate all pairwise distances between atoms in an ammonia molecule (NH₃) using NumPy arrays and vector arithmetic.
The distance between two atoms is given by:
Given atomic coordinates (in Ångströms):
Nitrogen: (0.0, 0.0, 0.0)
Hydrogen 1: (0.0, 0.94, 0.38)
Hydrogen 2: (0.81, -0.47, 0.38)
Hydrogen 3: (-0.81, -0.47, 0.38)
Task 1: Store positions as a 2D NumPy array#
# Create a 2D array with shape (4, 3) - 4 atoms, 3 coordinates each
atoms = np.array([[0.0, 0.0, 0.0], # N
[0.0, 0.94, 0.38], # H1
[0.81, -0.47, 0.38], # H2
[-0.81, -0.47, 0.38]]) # H3
atom_labels = ['N', 'H1', 'H2', 'H3']
print(f"Shape of atoms array: {atoms.shape}")
print(f"Number of atoms: {atoms.shape[0]}")
print(f"Number of coordinates per atom: {atoms.shape[1]}")
Shape of atoms array: (4, 3)
Number of atoms: 4
Number of coordinates per atom: 3
Tasks 2 & 3: Calculate pairwise distances using loops and vector arithmetic#
# Calculate all unique pairwise distances
print("Pairwise distances in NH₃:")
print("-" * 26)
for i in range(len(atoms)):
for j in range(i + 1, len(atoms)):
# Vector between atoms i and j
vector = atoms[j] - atoms[i]
# Calculate distance using np.linalg.norm()
distance = np.linalg.norm(vector)
print(f"{atom_labels[i]:2s} - {atom_labels[j]:2s}: {distance:.4f} Å")
Pairwise distances in NH₃:
--------------------------
N - H1: 1.0139 Å
N - H2: 1.0106 Å
N - H3: 1.0106 Å
H1 - H2: 1.6261 Å
H1 - H3: 1.6261 Å
H2 - H3: 1.6200 Å
Explanation:
atoms[j] - atoms[i]
performs element-wise subtraction to get the vector between two atomsnp.linalg.norm(vector)
calculates the Euclidean norm (magnitude) of the vectorThis is equivalent to
np.sqrt(np.sum(vector**2))
but more conciseThe nested loops with
j
starting fromi + 1
ensures we only calculate each distance once
# Let's verify one distance manually to understand what np.linalg.norm() does
# Distance between N (atom 0) and H1 (atom 1)
# Extract individual atoms
atom_N = atoms[0]
atom_H1 = atoms[1]
print(f"Position of atom_N: {atom_N}")
print(f"Position of atom_H1: {atom_H1}")
# Calculate step by step
diff = atom_H1 - atom_N
print(f"Difference vector: {diff}")
squared = diff**2
print(f"Squared components: {squared}")
sum_squared = np.sum(squared)
print(f"Sum of squares: {sum_squared:.4f}")
distance_manual = np.sqrt(sum_squared)
print(f"Distance (manual): {distance_manual:.4f} Å")
distance_norm = np.linalg.norm(atom_H1 - atom_N)
print(f"Distance (np.linalg.norm): {distance_norm:.4f} Å")
Position of atom_N: [0. 0. 0.]
Position of atom_H1: [0. 0.94 0.38]
Difference vector: [0. 0.94 0.38]
Squared components: [0. 0.8836 0.1444]
Sum of squares: 1.0280
Distance (manual): 1.0139 Å
Distance (np.linalg.norm): 1.0139 Å
Alternative approach: Store all distances in a matrix
An alternative approach is to calculate all distances and store these in a distance matrix:
# Create a distance matrix
n_atoms = len(atoms)
distance_matrix = np.zeros((n_atoms, n_atoms)) # initialise matrix of zeroes
for i in range(n_atoms):
for j in range(i+1, n_atoms): # only loop over unique atom pairs.
dr = np.linalg.norm(atoms[i] - atoms[j]) # calculate the distance dr_ij once
distance_matrix[i, j] = dr # store dr_ij
distance_matrix[j, i] = dr # store dr_ji (which is equal to dr_ij)
# Display the matrix with labels
print("Distance matrix (Å):")
print(" ", " ".join(f"{label:5s}" for label in atom_labels))
for i, label in enumerate(atom_labels):
print(f"{label:3s}: ", " ".join(f"{d:5.3f}" for d in distance_matrix[i]))
Distance matrix (Å):
N H1 H2 H3
N : 0.000 1.014 1.011 1.011
H1 : 1.014 0.000 1.626 1.626
H2 : 1.011 1.626 0.000 1.620
H3 : 1.011 1.626 1.620 0.000
Distance matrix interpretation:
The diagonal elements are all 0 (distance from an atom to itself) — we do not need to explicitly calculate these.
The matrix is symmetric (distance from A to B equals distance from B to A) - we only calculate this once
This format is useful for further analysis, such as finding the closest or furthest atom pairs, e.g.:
Comparison with previous exercises: In the Loops exercises, you calculated these same distances using lists and manual indexing. Using NumPy arrays makes the vector arithmetic more natural and the code more readable.
Exercise 3: Analysis of Experimental Data#
Problem: Analyze absorbance measurements at 540 nm over time to monitor a reaction.
Given measurements (arbitrary units):
Times: 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 minutes
Absorbance: 1.85, 1.52, 1.31, 1.08, 0.89, 0.71, 0.58, 0.49, 0.38, 0.32, 0.28
Task 1: Convert data to NumPy arrays#
# Original data as lists
times = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20] # minutes
absorbance = [1.85, 1.52, 1.31, 1.08, 0.89, 0.71, 0.58, 0.49, 0.38, 0.32, 0.28]
# Convert to NumPy arrays
times = np.array(times)
absorbance = np.array(absorbance)
print(f"Times array shape: {times.shape}")
print(f"Absorbance array shape: {absorbance.shape}")
Times array shape: (11,)
Absorbance array shape: (11,)
Task 2: Calculate reaction progress#
First, let’s calculate the relative absorbance at each time point (as a fraction of the initial value).
# Calculate relative absorbance (fraction of initial value)
initial_absorbance = absorbance[0]
relative_absorbance = absorbance / initial_absorbance
# Display reaction progress at selected time points
print("\nReaction progress:")
print("Time (min) | Absorbance | Relative absorbance")
print("-" * 45)
for i in range(0, len(times)):
print(f"{times[i]:10d} | {absorbance[i]:10.2f} | {relative_absorbance[i]:6.3f}")
Reaction progress:
Time (min) | Absorbance | Relative absorbance
---------------------------------------------
0 | 1.85 | 1.000
2 | 1.52 | 0.822
4 | 1.31 | 0.708
6 | 1.08 | 0.584
8 | 0.89 | 0.481
10 | 0.71 | 0.384
12 | 0.58 | 0.314
14 | 0.49 | 0.265
16 | 0.38 | 0.205
18 | 0.32 | 0.173
20 | 0.28 | 0.151
Now let’s find when the absorbance has decreased to half its initial value:
# Calculate half of initial value
half_value = initial_absorbance / 2
print(f"Half of initial absorbance: {half_value:.3f}")
# Find where absorbance drops below half
below_half = absorbance < half_value
# Get the first index where this occurs
if np.any(below_half):
half_life_index = np.argmax(below_half) # argmax finds first True
half_life_time = times[half_life_index]
print(f"\nAbsorbance drops below half its initial value at:")
print(f" Time: {half_life_time} minutes")
print(f" Actual absorbance: {absorbance[half_life_index]:.3f}")
Half of initial absorbance: 0.925
Absorbance drops below half its initial value at:
Time: 8 minutes
Actual absorbance: 0.890
Task 3: Calculate change between consecutive time points#
To understand the reaction rate, we need to calculate how much the absorbance changes between measurements.
# Calculate differences using array slicing
delta_absorbance = absorbance[1:] - absorbance[:-1]
delta_time = times[1:] - times[:-1]
print("Change in absorbance between consecutive measurements:\n")
print("Time interval | ΔAbsorbance")
print("-" * 27)
for i, delta_abs in enumerate(delta_absorbance):
print(f"t={times[i]:<2d} to t={times[i+1]:<2d} | {delta_absorbance[i]:11.3f}")
Change in absorbance between consecutive measurements:
Time interval | ΔAbsorbance
---------------------------
t=0 to t=2 | -0.330
t=2 to t=4 | -0.210
t=4 to t=6 | -0.230
t=6 to t=8 | -0.190
t=8 to t=10 | -0.180
t=10 to t=12 | -0.130
t=12 to t=14 | -0.090
t=14 to t=16 | -0.110
t=16 to t=18 | -0.060
t=18 to t=20 | -0.040
Understanding the array slicing:
absorbance[1:]
gives all elements from index 1 to the endabsorbance[:-1]
gives all elements from start to second-lastSubtracting these aligned arrays gives differences between consecutive elements
Task 4: Find when the absorbance first drops below 0.5#
threshold = 0.5
# Find when absorbance first drops below threshold using a loop
first_below = None
for i in range(len(absorbance)):
if absorbance[i] < threshold:
first_below = i
break # Stop once we find the first one
if first_below is not None:
print(f"Absorbance first drops below {threshold} at:")
print(f" Time: {times[first_below]} minutes")
print(f" Absorbance: {absorbance[first_below]:.2f}")
else:
print(f"Absorbance never drops below {threshold}")
Absorbance first drops below 0.5 at:
Time: 14 minutes
Absorbance: 0.49
Alternative solution using NumPy boolean indexing
# Create a boolean array where True means "below threshold"
below_threshold = absorbance < threshold
print(f"Boolean array: {below_threshold}")
# If any values are below threshold...
if np.any(below_threshold):
# argmax on a boolean array finds the first True
first_index = np.argmax(below_threshold)
print(f"\nUsing np.argmax(): First crossing at index {first_index}")
print(f" Time: {times[first_index]} minutes")
print(f" Absorbance: {absorbance[first_index]:.2f}")
Boolean array: [False False False False False False False True True True True]
Using np.argmax(): First crossing at index 7
Time: 14 minutes
Absorbance: 0.49
Explanation
While the first approach using a loop is clear and works perfectly, NumPy offers a more concise approach using boolean arrays.
When you write absorbance < threshold
, NumPy doesn’t just give you a single True/False answer. Instead, it compares EACH element of the array to the threshold and creates a new array of True/False values:
True
where the element is less than the thresholdFalse
where the element is greater than or equal to the threshold
This array of True/False values is called a boolean array. Let’s see what this looks like:
# When we compare an array to a value, we get a boolean array
below_threshold = absorbance < threshold
print("Original absorbance values:")
print(absorbance)
print(f"\nComparing each value to threshold ({threshold}):")
print(below_threshold)
print(f"\nThis is a boolean array with dtype: {below_threshold.dtype}")
# We can see the correspondence:
for i in range(len(absorbance)):
print(f" {absorbance[i]:.2f} < {threshold} ? {below_threshold[i]}")
Original absorbance values:
[1.85 1.52 1.31 1.08 0.89 0.71 0.58 0.49 0.38 0.32 0.28]
Comparing each value to threshold (0.5):
[False False False False False False False True True True True]
This is a boolean array with dtype: bool
1.85 < 0.5 ? False
1.52 < 0.5 ? False
1.31 < 0.5 ? False
1.08 < 0.5 ? False
0.89 < 0.5 ? False
0.71 < 0.5 ? False
0.58 < 0.5 ? False
0.49 < 0.5 ? True
0.38 < 0.5 ? True
0.32 < 0.5 ? True
0.28 < 0.5 ? True
Key concept: Boolean arrays
Boolean arrays are powerful tools in NumPy:
Created by comparing arrays to values (
array < value
,array == value
, etc.)Each element is
True
orFalse
Can be used to filter data:
array[boolean_mask]
returns only elements where mask isTrue
np.argmax()
on a boolean array finds the firstTrue
(sinceTrue=1
,False=0
)
This is more advanced than using loops, but can be very efficient for large datasets. However, loops are perfectly fine and often clearer if you are not familiar with more advanced NumPy techniques.