- Understand the elements of a research compendium
- Understand conventions for coding styles
9 January 2023
We are witnessing an increasing number of examples where the original analysis cannot be reproduced
According to the National Science Foundation:
The calculation of quantitative scientific results by independent researchers using the original data and methods
Reproducibility can be further broken down
“A standard and easily recognizable way for organizing the digital materials of a project to enable others to inspect, reproduce, and extend the research.” (Marwick et al. 2018)
Project organization should follow the conventions of the scientific community
Project organization should follow the conventions of the scientific community
Maintain a clear separation of data, methods and output
|-code |-01_data-ingest.R |-02_data-cleaning.R |-03_data-QAQC.R |-04_model-fitting.R |-05_create-figures.R |-data |-skagit_steelhead-escapement.csv |-skagit_river-flow.csv
Project organization should follow the conventions of the scientific community
Maintain a clear separation of data, methods and output
Specify the computational environment
Organize your compendium so another person knows what to expect from the plain meaning of the file and directory names
Marwick et al (2018)
Marwick et al (2018)
Marwick et al (2018)
Marwick et al (2018)
Licensing
Licensing
Version control
Licensing
Version control
Persistence
Licensing
Version control
Persistence
Metadata
One could always create a structure from scratch
Alternatively, consider some existing tools
setwd("/Users/Mark/Documents/projects/salmon/final_code")
setwd("/Users/Mark/Documents/projects/salmon/final_code")
Do you see any problems with this approach?
setwd("/Users/Mark/Documents/projects/salmon/final_code")
This won’t work unless your directory structure matches mine
Use relative paths to your work
The {here} package makes this really easy
## Absolute path ## /Users/Mark/Documents/projects/salmon/data ## Relative path > data_dir <- here::here("data") > data_dir > [1] "/Users/Mark/Documents/projects/salmon/data"
Human readable
Human readable
Machine readable
_
and -
as delimitersConsider all of these special characters
~`!@#$%^&()+={}[]|:“;’<,>.?/*
Consider all of these special characters
~`!@#$%^&()+={}[]|:“;’<,>.?/*
The pipe | was the only character available for use as a delimiter in a plain text file for an energy efficiency database because all the others had been used in variable names.
resume.docx
Mark's data.xlsx
figure 1.pdf
resume.docx
Mark's data.xlsx
figure 1.pdf
scheuerell_resume_2021-01-01.docx
marks_skagit_steelhead_age-composition.xlsx
figure-01_scatterplot-length-mass.pdf
Human readable
Machine readable
Works with default ordering
2019-10-20_skagit_harvest-commerical.csv 2019-10-20_skagit_harvest-recreational.csv 2020-09-18_nooksack_harvest-commerical.csv 2020-09-18_nooksack_harvest-recreational.csv
01_data-ingest.R 02_data-cleaning.R 03_data-QAQC.R 04_model-fitting.R 05_create-figures.R
Taking the time to invest in these styles and strategies can save you a lot of time in the future, but in the end it’s up to you
We’ll learn about GitHub and how to use it for personal and collaborative projects