Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Reading UV Data

In laboratories we interact with a lot of instruments on which to collect data. We will write a programme to read in real data from an instrument and plot it.

In this mini project we will load and manipulate a file of UV data. Once loaded we will also perform simple data manipulations.

Note that loading data files can be made simpler if using some scientific libraries. However, these libraries assume certain conventional formats for presenting data. A lot of scientific data is generated in unexpected formats as is the case in this exercise. Your objective should be to try to understand the format of the data and design a parser accordingly.

Tip: A parser is a piece of code that takes data supplied by the user and splits it up into appropriate variables and data types. It parses the data in a similar way to how we parse language

Learning Outcomes

Read in text files
Plot data
Simple data analysis

Reading the file

First we will load a file proj_1.csv. Make sure that the file you want to load is in the same folder as your Python notebook or script. Here, we will use the recommended with statement, which is used to open files. When the with block is over, the file is closed by Python.

# store the name of the file to read
file_name = "proj_1.csv"
# open the file. you can now access its contents referring to it as "in_file"
with open(file_name, 'r') as in_file:
    # you can iterate through a file as if it were a list
    for line in in_file:
        # here, we have access to each line of the file as a string
        print(line)

You should see the contents of the file printed out. Open the same file in Notepad or Excel and compare how they look to make sure that all the data is really read in by Python.

Let’s check we understand the structure of this file: First there are several lines without data in them. These are header lines which contain useful information, such as a title for the data and when the data is taken. These are metadata and are really important for managing and understanding data part of the FAIR data principles.

The final non-data line tells us what each column of data is. Then we have a long list of data values, four columns separated by commas. From the column titles we can say that there are a background spectrum and an iodine spectrum.

Parsing the data

Using the code above, you can access each line of the file as a string.

Typically, data stored in text files is split using a character called a separator or a delimiter.

Common delimiters you will come across are:

  • commas, usually in a .csv file which stands for comma separated value (it is also not just for Excel)

  • tabs, sometimes (though rare) in a .tsv file

  • spaces, often a fixed number

  • semi-colon

How to deal with these?

For instance, take the following string:

"hydrogen;helium;lithium"

each part of the string is separated using the separator ";".

Thankfully, Python has a built-in function to easily separate strings based on their separator:

line = "hydrogen;helium;lithium"

# turn the string into a list of strings separated at ";"
split_line = line.split(";")

# we can now print the list of each element name, or choose specific element names using indexing
print(split_line)
print(split_line[1])

Note that you can change the separator of the split() function by replacing the argument of the function with any string. So you could have split(",") or split(".") or even split("pineapple"). If you don’t give any argument, split() will split the string by spaces. Try to write a code that splits the following string and adds all the numbers together:

"1.23 3.55 5.22"

Plotting

For the last stage of your mini-project, you will need to visualise some of the data from the .csv file as a plot. Python has a powerful visualisation library called matplotlib.

As a simple example, this is how to produce a line plot:

import matplotlib.pyplot as plt
# we will plot a population percentage vs time
population = [100, 50, 25, 12, 6]
time = [0, 10, 20, 30, 40]

# the following line creates the ax object, which has many functions for plotting data
fig, ax = plt.subplots()
# the plot() function makes a line plot
ax.plot(time,population)

# show the figure in the notebook
plt.show()

To save a high-quality image, instead of plt.show(), you could use plt.savefig("figure_name.pdf").

Of course, the figure above doesn’t yet match scientific quality. It is up to you to look up how to add axis labels in matplotlib. The documentation for this library is very comprehensive, so for instance if you wanted to make a scatter plot instead of a line plot, it may help to read up on ax.scatter() (link).

Exercise

It is time to put all of these new skills to use.

Open a new file in your favourite code editor (VSCode, Spyder, Sublime etc.). Write a Python script that reads in the iodine spectrum data from proj_1.csv and plots it.

General advice:

  • You will need to make empty lists before your for loop, so that you can append the relevant numbers to these lists.

  • A lot of lines from the .csv file don’t have any spectral data. You will have to make sure that your code ignores those lines when it imports the data.

  • The header lines do contain information on the experiment that might be useful, think of how you could store these.

Key points:

  • Make sure you plot is of scientific quality and could be submitted with a lab report.

To start use comments to construct the outline of your script. An outline is provided below

# Put any imports here


#Declare variables that will be needed here
# list of x vallues
# list of y values

# Load file
# Open file
# Go through each line
    #split on the delimter
    # take columns that are needed and append into arrays

# Plot the file contents
# label the axes
# add legend