NumPy Basics#

NumPy provides its own data type, called the NumPy array. Arrays are similar to Python lists, but with important differences that make them much more suitable for numerical computing.

Importing NumPy#

The standard way to import NumPy uses the alias np:

import numpy as np

Rather than writing import numpy, we commonly use import numpy as np. The as keyword allows us to assign numpy and all of its functions to the variable name np. This is a universal convention; you will see np used everywhere in NumPy code.

Creating Arrays#

We create a NumPy array similarly to how we create a list, using the np.array() function:

# Create an array from a list of bond lengths (in Ångströms)
bond_lengths = [1.52, 1.54, 1.51, 1.53, 1.52]
bond_array = np.array(bond_lengths)

print(bond_array)
print(type(bond_array))
[1.52 1.54 1.51 1.53 1.52]
<class 'numpy.ndarray'>

Array Properties#

Arrays have several useful properties that tell you about their structure:

temps = np.array([298, 310, 323, 335, 348])

print(f"Shape: {temps.shape}")
print(f"Data type: {temps.dtype}")
print(f"Number of elements: {temps.size}")
Shape: (5,)
Data type: int64
Number of elements: 5

The .shape property tells you the dimensions of the array. For a 1D array, this is a tuple with a single value. The .dtype property tells you what type of data the array contains, and .size gives you the total number of elements.

Homogeneous Data#

NumPy arrays can only contain data of the same type, in contrast to lists, which can contain data of different types.

Trying to create a NumPy array that contains data of different types will either cause an error, or your data will be converted to a single consistent type (which probably is not what you wanted to happen).

# All integers
integers = np.array([1, 2, 3, 4])
print(f"Integer array: {integers}, dtype: {integers.dtype}")

# Mix of int and float → all become floats
mixed_numbers = np.array([1, 2.5, 3, 4])
print(f"Mixed numbers: {mixed_numbers}, dtype: {mixed_numbers.dtype}")

# Mix with strings → everything becomes strings!
mixed_types = np.array([1, 2, 'three'])
print(f"Mixed types: {mixed_types}, dtype: {mixed_types.dtype}")
Integer array: [1 2 3 4], dtype: int64
Mixed numbers: [1.  2.5 3.  4. ], dtype: float64
Mixed types: ['1' '2' 'three'], dtype: <U21

If you accidentally include a string in your array, all your numbers will be converted to strings. Always check your array’s dtype if calculations aren’t working as expected.

Exercise#

You have measured pH values at different times: 7.0, 6.8, 7.2, 6.9, 7.1

Create a NumPy array called pH_values containing these measurements, then print the array and its shape.

Show solution
pH_values = np.array([7.0, 6.8, 7.2, 6.9, 7.1])
print(pH_values)
print(f"Shape: {pH_values.shape}")

Two-Dimensional Arrays#

NumPy arrays can have more than one dimension, which makes them useful for representing multidimensional datasets, or mathematical objects like matrices.

# Molecular coordinates: 3 atoms, each with x, y, z positions
molecule = np.array([[0.0, 0.0, 0.0],    # Atom 1
                     [1.0, 0.0, 0.0],    # Atom 2
                     [0.0, 1.5, 0.0]])   # Atom 3

print(molecule)
print()
print(f"Shape: {molecule.shape}")
[[0.  0.  0. ]
 [1.  0.  0. ]
 [0.  1.5 0. ]]

Shape: (3, 3)

For a 2D array, the shape is (rows, columns). For molecular coordinates, this is typically (number of atoms, 3) where each row is an atom and the three columns are the x, y, and z coordinates.

Exercise#

Create an array for water (H2O) with these coordinates:

Atom

x

y

z

O

0.0

0.0

0.0

H

0.96

0.0

0.0

H

-0.24

0.93

0.0

Print the array and check its shape is (3, 3).

Show solution
water = np.array([[0.0, 0.0, 0.0],
                  [0.96, 0.0, 0.0],
                  [-0.24, 0.93, 0.0]])

print(water)
print(f"Shape: {water.shape}")

Creating Arrays#

Besides np.array(), NumPy provides functions to create arrays with specific patterns:

# Array of zeros (useful for initialising results)
zeros = np.zeros(5)
print(f"Zeros: {zeros}")

# Array of ones
ones = np.ones(4)
print(f"Ones: {ones}")

# Evenly spaced values
x = np.linspace(0, 10, 5)  # 5 values from 0 to 10
print(f"Linspace: {x}")

# Sequence with step size
y = np.arange(0, 10, 2)  # Start at 0, stop before 10, step by 2
print(f"Arange: {y}")
Zeros: [0. 0. 0. 0. 0.]
Ones: [1. 1. 1. 1.]
Linspace: [ 0.   2.5  5.   7.5 10. ]
Arange: [0 2 4 6 8]

np.linspace(start, stop, num) creates an array of num evenly spaced values between start and stop (inclusive). This is particularly useful for creating x-axes when plotting data.

For example, np.linspace(0, 10, 5) creates 5 values evenly spaced from 0 to 10, giving us [0, 2.5, 5.0, 7.5, 10.0]. If you need 100 points for a smooth plot from 300 K to 400 K, you would use np.linspace(300, 400, 100).

np.arange(start, stop, step) creates a sequence similar to Python’s range(), but returns a NumPy array. The sequence starts at start and increments by step, stopping before stop. For example, np.arange(0, 10, 2) gives [0, 2, 4, 6, 8].

Indexing Arrays#

Individual elements, or slices, of an array can be referenced using the same syntax as for lists.

energies = np.array([100, 150, 125, 175, 145])

print(f"First element: {energies[0]}")
print(f"Last element: {energies[-1]}")
print(f"First three: {energies[:3]}")
print(f"Every second element: {energies[::2]}")
First element: 100
Last element: 145
First three: [100 150 125]
Every second element: [100 125 145]

For N-dimensional arrays, individual elements or slices can be referenced using the same syntax as for nested lists.

# Using our water molecule from earlier
water = np.array([[0.0, 0.0, 0.0],     # O
                  [0.96, 0.0, 0.0],    # H
                  [-0.24, 0.93, 0.0]]) # H

# Get the second row (first hydrogen)
print(f"First hydrogen: {water[1]}")

# Get a specific element
print(f"Second element of first row: {water[1][2]}")
First hydrogen: [0.96 0.   0.  ]
Second element of first row: 0.0

NumPy also allows a more compact N-dimensional index notation, using comma-separated values:

# More compact notation: array[row, column]
print(f"Element at row 1, column 2: {water[1, 2]}")

# Get x-coordinate of oxygen (row 0, column 0)
print(f"Oxygen x-coordinate: {water[0, 0]}")

# Get all x-coordinates (all rows, column 0)
all_x = water[:, 0]
print(f"All x-coordinates: {all_x}")

# Get all y-coordinates
all_y = water[:, 1]
print(f"All y-coordinates: {all_y}")
Element at row 1, column 2: 0.0
Oxygen x-coordinate: 0.0
All x-coordinates: [ 0.    0.96 -0.24]
All y-coordinates: [0.   0.   0.93]

The : notation means “all elements” in that dimension. So water[:, 0] means “all rows, first column”, giving us all the x-coordinates.

Exercise#

Using the water molecule array, extract:

  1. The z-coordinate of the second hydrogen atom

  2. All z-coordinates

  3. Just the x and y coordinates of all atoms (first two columns)

Show solution
water = np.array([[0.0, 0.0, 0.0],
                  [0.96, 0.0, 0.0],
                  [-0.24, 0.93, 0.0]])

# 1. z-coordinate of second hydrogen (row 2, column 2)
second_h_z = water[2, 2]
print(f"Second H z-coordinate: {second_h_z}")

# 2. All z-coordinates (all rows, column 2)
all_z = water[:, 2]
print(f"All z-coordinates: {all_z}")

# 3. x and y coordinates only (all rows, first two columns)
xy_coords = water[:, :2]
print(f"XY coordinates:\n{xy_coords}")
print(f"Shape: {xy_coords.shape}")

Summary#

You have learned how to:

  • Import NumPy and create arrays from lists

  • Check array properties (shape, dtype, size)

  • Create 2D arrays for molecular coordinates

  • Use np.zeros(), np.ones(), and np.linspace() to create arrays

  • Index and slice arrays to access specific elements or subsets