Science is about the discovery and sharing of information, but much of the process is often shrouded in mystery. In some cases, questions about what and how things were done have left other scientists unable to successfully reproduce previous findings, bringing into question the integrity of the results. Fortunately, however, we also find ourselves in the midst of an expanding community of developers and practitioners of the tools and skills necessary for more transparent design, analysis, and reporting of scientific studies. These advancements have also supported better documentation, management, and access to data, which has facilitated new and often remote collaborations. This relatively new field of “Data Science” is broadly defined, but generally combines elements of data literacy, computer programming, statistics, and graphic design. A data scientist is generally expected to identify relevant questions, gather data from multiple sources, organize it, extract meaningful information, and communicate the results to others.
This course will provide an overview of some data science tools and best practices that can be used to create transparent and reproducible workflows when working with environmental data. Students will learn how to translate raw data from field and lab studies into databases and “tidy” digital formats, which can then be used for plotting, statistical analyses, etc. Students will learn how to track the history of file changes (version control), collaborate online with others, and generate “recipes” for re-creating one’s work. Although failure and frustration in science are common, the open science community tries hard to be welcoming and helpful. Thus, students will also learn how and where to ask for help when attempting something new (e.g., How do I create X from Y?), debugging or fixing code (e.g., What does this error message mean?), etc.
Importantly, this course will also discuss the various types of data used in environmental science, which includes an exploration of what constitutes “raw” data versus derived products, and the implications of using the various types. The course will also focus on the ethical and social consequences of using various data types. For example, some recent studies have inferred gender identities from survey responses based on first names, and used that information to summarize hiring practices, demographics of successful grant applications, etc. However, doing so is widely regarded as inappropriate due to non-binary genders and inaccuracies in gender assignment.
Please note: Although statistical analysis and figures/diagrams are very important in data science, this course will not focus on those aspects because there are others in the College of the Environment that do. Some examples include:
FISH 454 - Ecological Modeling
FISH 458 - Modeling and Estimation in Conservation and Resource Management
FISH 546 - Bioinformatics for Environmental Sciences
FISH 554 - Beautiful Graphics in R
QERM 514 - Analysis of Ecological and Environmental Data
By the end of the quarter, students should be able to:
Use Git to commit changes to files, recover old versions, push/pull changes to remote repositories, and manage merge conflicts
Use GitHub to create repositories, manage projects, open/comment/close issues, and submit pull requests
Create a reproducible example to use in a quest for help
Import and clean a messy data file using a variety of {packages}::functions in R
Properly describe and document data using Ecological Metadata Language
Create and access a relational database with PostgreSQL
Use unit tests in R to evaluate code functionality
Create a package in R and document its contents
Use R Markdown to combine text, equations, code, tables, and figures into reports, websites, and presentations
Create dynamic html reports with Shiny
Dr. Mark
Scheuerell
Associate Professor, School of Aquatic & Fishery Sciences
scheuerl@uw.edu
Markus
Min
PhD Candidate, School of Aquatic and Fishery Sciences
Jennifer
Scheuerell
Chief Technology Officer, Sharper Informatics Solutions
Dr. Elizabeth
Holmes
Research Fish Biologist, NOAA Northwest Fisheries Science Center
Dr. Margaret
Siple
Research Fish Biologist, NOAA Alaska Fisheries Science Center
M/W/F from 11:30-12:20 in FSH 213
By appointment
Students should have a working knowledge of the R computing software, such as that provided in FISH 552/553.
I am dedicated to providing a welcoming and supportive learning environment for all students, regardless of their background, identity, physical appearance, or manner of communication. Any form of language or behavior used to exclude, intimidate, or cause discomfort will not be tolerated. All course participants (instructor, students, guests) are expected to abide by the SAFS Code of Conduct. In order to foster a positive and professional learning environment, I ask the following:
Please let me know if you have a name or set of preferred pronouns that you would like me to use
Please let me know if anyone in class says something that makes you feel uncomfortable[1]
In addition, I encourage the following kinds of behaviors:
Use welcoming and inclusive language
Show courtesy and respect towards others
Acknowledge different viewpoints and experiences
Gracefully accept constructive criticism
Although I strive to create and use inclusive materials in this course, there may be overt or covert biases in the course material due to the lens with which it was written. Your suggestions about how to improve the value of diversity in this course are encouraged and appreciated.
Please note: If you believe you have been a victim of an alleged violation of the Student Conduct Code or you are aware of an alleged violation of the Student Conduct Code, you have the right to report it to the University.
All students deserve access to the full range of learning experiences, and the University of Washington is committed to creating inclusive and accessible learning environments consistent with federal and state laws. If you feel like your performance in class is being impacted by your experiences outside of class, please talk with me.
If you have already established accommodations with Disability Resources for Students (DRS), please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course. If you have not yet established services through DRS, but have a temporary health condition or permanent disability that requires accommodations (e.g., mental health, learning, vision, hearing, physical impacts), you are welcome to contact DRS at 206-543-8924 or via email or their website. DRS offers resources and coordinates reasonable accommodations for students with disabilities and/or temporary health conditions. Reasonable accommodations are established through an interactive process between you, your instructor(s) and DRS.
Students who expect to miss class or assignments as a consequence of their religious observance will be provided with a reasonable accommodation to fulfill their academic responsibilities. Absence from class for religious reasons does not relieve students from responsibility for the course work required during the period of absence. It is the responsibility of the student to provide the instructor with advance notice of the dates of religious holidays on which they will be absent. Students who are absent will be offered an opportunity to make up the work, without penalty, within a reasonable time, as long as the student has made prior arrangements.
This course will revolve around hands-on computing exercises that demonstrate the topics of interest. Therefore, students are strongly recommended to bring their own laptop to class, although students are certainly free to work with one another. For students without access to a personal laptop: it is possible to check out UW laptops for an entire quarter (see the Student Services office for details).
All of the software we will be using is free and platform independent, meaning students may use macOS, Linux, or Windows operating systems. In addition to a web browser, we will be using the free R software and the desktop version of the R Studio integrated development environment (IDE). We will also be using various packages not contained in the base installation of R, but we will wait and install them at the necessary time. The instructor will be available during the first week of class to help students troubleshoot any software installation problems.
We will be using Git, a free and open source version control system for tracking changes to our files.
We will be using PostgreSQL to create and access relational databases.
Students will also be required to have a user account on GitHub, which we will be using for file hosting and communications via “issues”. If you do not already have an account, you can sign up for a free one here. The instructor will provide training on how to use the intended features in GitHub.
This course will introduce new material primarily through prepared slides and hands-on demonstrations. Students will be expected to work both individually and collaboratively (to the extent possible given the current conditions); course content and evaluation will emphasize the communication of ideas and the ability to think critically more so than a specific pathway or method. Other areas of this website provide an overview of the topics to be covered, including links to weekly reading assignments, lecture materials, computer labs, and homework assignments.
This course will involve a lot of communication between and among students and the instructor. Short questions should be addressed to me via email; I will try my best to respond to your message within 24 hours. Under more normal circumstances, detailed questions would be addressed to me in person–either after class or during a scheduled meeting. In this case, however, we will schedule one-on-one or group Zoom calls as needed.
In addition to email and Zoom, we will use the “Issues” feature in GitHub to ask questions and assist others. Specifically, questions and answers can be posted to the issues in the course’s “assistance” repository here.
Students will be evaluated on their knowledge of course content and their ability to communicate their understanding of the material via individual homework assignments (80%) and their active participation in class (20%). There will be 8 homework assignments, each counting toward 10% of the final grade. Please note, all assignments must be turned in to achieve a passing grade.
Homework will be assigned each Friday of weeks 2-9 and will be due by 11:59 PM 9 days later on the following Sunday (weeks 3-10). The instructor will evaluate and return student homework assignments within one week of their due date. If for some reason you cannot meet the homework deadline, contact the instructor as soon as possible to discuss other options. Please see the Homework page for more details.
Data science is about communicating your ideas and findings to others. As such, this course will require students to engage with the instructor and one another in small groups. I expect all students to contribute to our discussions of concepts, results, bugs/errors, etc.
Students should discuss any potential schedule conflicts with the instructor during the first week of class.
Faculty and students at the University of Washington are expected to maintain the highest standards of academic conduct, professional honesty, and personal integrity. Plagiarism, cheating, and other academic misconduct are serious violations of the Student Conduct Code. I have no reason to believe that anyone will violate the Student Conduct Code, but I will have no choice but to refer any suspected violation(s) to the College of the Environment for a Student Conduct Process hearing. Students who have been guilty of a violation will receive zero points for the assignment in question.
We are in the midst of an historic pandemic that is creating a variety of challenges for everyone. If you should feel like you need some help, please consider the following resources available to students.
If you are experiencing a life-threatening emergency, please dial 911.
Crisis
Clinic
Phone: 206-461-3222 or toll-free at 1-866-427-4747
UW
Counseling Center
Phone: 206-543-1240
Immediate
assistance
If you feel unsafe or at-risk in any way while taking any course, contact SafeCampus (206-685-7233) anytime–no matter where you work or study–to anonymously discuss safety and well-being concerns for yourself or others. SafeCampus can provide individualized support, discuss short- and long-term solutions, and connect you with additional resources when requested. For a broader range of resources and assistance see the Husky Health & Well-Being website.
No student should ever have to choose between buying food or textbooks. The UW Food Pantry helps mitigate the social and academic effects of campus food insecurity. They aim to lessen the financial burden of purchasing food by providing students access to shelf-stable groceries, seasonal fresh produce, and hygiene products at no cost. Students can expect to receive 4 to 5 days’ worth of supplemental food support when they visit the Pantry, located on the north side of Poplar Hall at the corner of NE 41st St and Brooklyn Ave NE. Visit the Any Hungry Husky website for additional information, including operating hours and additional food support resources.
[1] If the instructor should be the one to say something that makes a student uncomfortable, the student should feel free to contact the Director of the School of Aquatic and Fishery Sciences.