24 February 2025

Jennifer Scheuerell

Relational databases

Definition

  • a relational database organizes data into tables
    comprising rows and columns
  • related information is contained in either the same table or
    across multiple tables

High level concepts

Relational databases are created and manipulated using
Structured Query Language (SQL)

High level concepts

Relational databases are created and manipulated using
Structured Query Language (SQL)

Relational database management systems allow users
to create relational databases using SQL

Some well known systems include:

  • PostgreSQL
  • MySQL
  • Oracle
  • Microsoft SQL Server

An aside on MS Access

  • proprietary software \((\$)\)
  • will only run on Windows OS
  • modest size limits
  • cannot be hosted securely online

Organizing data in tables

Relational databases store data in tables

  • columns in a table are fixed fields
  • fields have unique data types (eg, INTEGER, TEXT)
  • rows are unique records
  • columns have a fixed order in each table, but…
  • …SQL does not guarantee the order of the rows within a table

An example table

record_ID color smell count last_updated
record_one green pine 13 2024-02-17
record_two yellow lemon 20 2024-12-13
record_three red cherry 5 2025-02-19

Same table again

record_ID color smell count last_updated
record_three red cherry 5 2025-02-19
record_one green pine 13 2024-02-17
record_two yellow lemon 20 2024-12-13

Jargon

Primary and foreign keys

a primary key is one (or more) field(s) in a table that
uniquely identifies each record in the table

Jargon

Primary and foreign keys

a primary key is one (or more) field(s) in a table that
uniquely identifies each record in the table


a foreign key is a field in a table that joins to a field in a second table

Jargon

Joins

a join brings together two tables using one or more fields they share in common

An example join

Jargon

Atomic values

an atomic value is a singular piece of information

Jargon

Atomic values

an atomic value is a singular piece of information


NOTE: relational databases expect only one piece of information
to be stored in any one field within a record

Example table with non-atomic values

record_ID date site wind_speed_direction
record_one 2024-06-02 Madison Park 10N
record_two 2024-06-02 Magnuson Pt 13NE
record_three 2024-06-02 St Edwards 7E

Jargon

Null values

a null value indicates that the information for that field for that record is missing

Jargon

Null values

a null value indicates that the information for that field for that record is missing


NOTE: in R this is an NA

Jargon

Select query & result set

a select query is a SQL statement that extracts data from one or more tables

Jargon

Select query & result set

a select query is a SQL statement that extracts data from one or more tables


a result set is the data that are returned by the select query statement

Jargon

View

a view is a saved select query


NOTE: only the SQL statement is saved, not the result set from running the query

Jargon

View

a view is a saved select query


NOTE: only the SQL statement is saved, not the result set from running the query


In other words, a view is to SQL as a script is to R

Tables in databases

  • 2+ tables are grouped into databases
  • multiple databases within a server constitute a database cluster

Designing a database

Designing a database

Database normalization

Normalization ensures that the database is

  • consistent
  • efficient
  • easy to maintain
  • free of unnecessary data duplication

Next time

  • connect to a remote database via RStudio

  • view tables

  • write a simple query