There is a good chance that you’ve already had to ask someone for help with a question related to some aspect of data science. For example,
What does this error mean?
Why can I not reproduce what you’ve done?
How do I [fill in the blank]?
Is there an existing package that allows me to [fill in the blank]?
The bad news is that these types of questions may follow you around throughout your career. The good news, however, is that there is a large community of people willing to help you find the solutions you need.
When you get stuck on a problem and need some advice, others will be able to better provide assistance if you include information that
they can easily understand, and
they can use to reproduce the problem.
This process is variously referred to as creating a
minimal, reproducible example (reprex)
minimal, complete and verifiable example (mcve)
minimal, workable example (mwe)
The bottom line is that you should
Use the least amount of code possible to produce the problem (minimal)
Provide everything (code, data) someone would need to reproduce your problem (complete)
Verify that the code you’re about to share reproduces the problem (reproducible)
In addition, you should phrase your question in manner that is polite and free of jargon or other language constructs that might confuse people. The data science community is global, so recognize that someone reading your question might not speak English as a first language. Also consider that others may very well stumble across your question, and hopefully the correct answer, years after it was solved. As such, using plain, descriptive language in the title and body of a post will allow others to find it when searching for answers. For example, this problem I posted on Stack Overflow in 2016 has since received over 2000 views!
There tends to be a strong negative correlation between the length of your code and the likelihood that someone will 1) read all the way through it, and 2) actually be able to solve your problem. Therefore, you should consider the following points:
Consider creating a new script, including only those elements germane to the problem;
Remove extraneous information that doesn’t influence the problematic code;
Use simple, descriptive names for parameters, variables, and functions;
Include meaningful comments in your code to help others understand what you’re trying to accomplish;
Take advantage of your IDE’s options to create nicely formatted code.
command + I
on a Mac or
ctrl + I
in Windows
command + shift + A
on a Mac or
ctrl + shift + A
in Windows
It’s important to include all of the information necessary to reproduce the problem within the question itself. Don’t put other people in a position where their first response to you is a request for more information. In addition, please consider the following advice:
If there are data involved, include a small subset of the data along with any necessary code;
Break your problem down into small code blocks and include a description of each block’s purpose;
It’s OK to include an image or figure in your question, but
Do not use images of code; rather, copy/paste the actual code so others don’t have to re-type anything;
Include information or links to other problems/solutions that you’ve tried or are referencing.
In order to help you, other people will need to verify that they can indeed reproduce the problem you’re having. Phrases like “Help!”, “This won’t work!”, or “Why does this happen?” don’t offer much information. Instead, use descriptive language that gets to the question at hand, including:
Explain what the expected result should be;
Provide the exact wording of any error messages;
Indicate which line(s) of code produce(s) the problem;
Use a brief, descriptive summary of your problem as the title of your question.
Tip: You should double-check that your example reproduces the problem. It’s good practice to quit RStudio (or other software) and restart it from scratch. If possible, consider testing your example on another computer or operating system.
In addition to minimal, complete, and reproducible code (and possibly data), you should also include information about your larger work environment. This includes:
Your operating system (Mac, Windows, Linux) and its version;
Versions of your software (R, packages);
sessionInfo()
and
packageVersion("package-name-here")
are both helpful
functions.
Creating a reproducible example (“reprex”) can take some work, but so does answering other people’s questions. As Jenny Bryan says, “Help me help you.” Here are some options for helping to create a reprex.
dput()
Copying/pasting code is relatively easy, but trying to include any
data necessary to create a complete reprex can be tricky. In general,
including a .csv
file or other data format is difficult
without pointing someone to an external source, link, etc.
Tip: The dput()
function provides a
convenient way of writing all of the information in a data frame in a
compact manner.
Here’s an example of using dput()
on the Palmer penguins
data set. This data set is actually quite large, so we’ll only use
the first 10 rows of the data frame.
## load the library
library(palmerpenguins)
## get the data & inspect them
data(package = 'palmerpenguins')
head(penguins)
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <fct> <fct> <dbl> <dbl> <int> <int> <fct>
## 1 Adelie Torge… 39.1 18.7 181 3750 male
## 2 Adelie Torge… 39.5 17.4 186 3800 fema…
## 3 Adelie Torge… 40.3 18 195 3250 fema…
## 4 Adelie Torge… NA NA NA NA <NA>
## 5 Adelie Torge… 36.7 19.3 193 3450 fema…
## 6 Adelie Torge… 39.3 20.6 190 3650 male
## # … with 1 more variable: year <int>
## write the first 10 rows of data
dput(penguins[1:10,])
## structure(list(species = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
## 1L, 1L, 1L, 1L), .Label = c("Adelie", "Chinstrap", "Gentoo"), class = "factor"),
## island = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
## 3L), .Label = c("Biscoe", "Dream", "Torgersen"), class = "factor"),
## bill_length_mm = c(39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9,
## 39.2, 34.1, 42), bill_depth_mm = c(18.7, 17.4, 18, NA, 19.3,
## 20.6, 17.8, 19.6, 18.1, 20.2), flipper_length_mm = c(181L,
## 186L, 195L, NA, 193L, 190L, 181L, 195L, 193L, 190L), body_mass_g = c(3750L,
## 3800L, 3250L, NA, 3450L, 3650L, 3625L, 4675L, 3475L, 4250L
## ), sex = structure(c(2L, 1L, 1L, NA, 1L, 2L, 1L, 2L, NA,
## NA), .Label = c("female", "male"), class = "factor"), year = c(2007L,
## 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L
## )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
## ))
Note: The output from dput()
may look a
bit strange, but it’s R’s way of storing the
information contained in a data frame.
Tip: You can copy/paste the output above into a post with your code and someone could simply assign it to an object and it will render just fine.
## assign output to an object
dat <- structure(list(species = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("Adelie", "Chinstrap", "Gentoo"), class = "factor"),
island = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("Biscoe", "Dream", "Torgersen"), class = "factor"),
bill_length_mm = c(39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9,
39.2, 34.1, 42), bill_depth_mm = c(18.7, 17.4, 18, NA, 19.3,
20.6, 17.8, 19.6, 18.1, 20.2), flipper_length_mm = c(181L,
186L, 195L, NA, 193L, 190L, 181L, 195L, 193L, 190L), body_mass_g = c(3750L,
3800L, 3250L, NA, 3450L, 3650L, 3625L, 4675L, 3475L, 4250L
), sex = structure(c(2L, 1L, 1L, NA, 1L, 2L, 1L, 2L, NA,
NA), .Label = c("female", "male"), class = "factor"), year = c(2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
## inspect the object
dat
## # A tibble: 10 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 2 more variables: sex <fct>, year <int>
{reprex}
packageThe process of creating a reprex can also introduce unintended
mistakes in your code when copying/pasting or retyping. As an instructor
at UBC, Jenny Bryan found herself responding to hundreds of issues and
questions per semester, so she (and others) created the {reprex}
package.
{reprex}
from the command lineTask: Begin by loading the reprex
package.
library(reprex)
Task: Select the code you’re interested in and copy it to the clipboard.
a <- 1
b <- 0
a / b
Task: Type reprex()
at the
R command prompt. It will think for a moment and then
respond with a message about the reprex output.
> reprex()
✖ Install the styler package in order to use `style = TRUE`.
ℹ Rendering reprex...
✔ Reprex output is on the clipboard.
Tip: If you’re using RStudio, you’ll see a preview of your rendered reprex in the Viewer pane.
Tip: Your code, its result, and some additional
information about the date and reprex
version are all now
sitting on your clipboard and available for pasting somewhere for help
(see below).
Note: In the background, reprex
produces this chunk of code, which will render nicely when pasted into a
markdown document:
```{r}
a <- 1
b <- 0
a / b
#> [1] Inf
```
Tip: There are a number of additional options you
can pass to reprex()
that will format the output for a
particular forum for help (e.g., venue = gh
for GitHub) or
to also include the session information (e.g.,
session_info = TRUE
).
{reprex}
in RStudio{reprex}
is also designed to work seamlessly within
RStudio and will do most of the work for you. To do so,
you can access its functionality via the Addins
pulldown menu.
Tip: Click here to learn more about RStudio addins.
Task: After you’ve loaded the {reprex}
package, click on the Addins pulldown menu in the code
pane and type “reprex” in the search bar. Select Render
reprex… from the menu.
Note: You should see up a popup window with choices about where the code is located, the target venue for help, and whether you want to append the session information.
Task: Make your desired selections and click on the blue Render button in the upper right, which will copy all of the information to your clipboard and also display in the Viewer pane.
Note: At this point you have several options for seeking answers to your question or problem. The following example shows an option for creating an Issue in GitHub, but there are lots of others (see next section).
Task: Navigate to a GitHub repo (perhaps the one where your coding project lives) and create a new issue with an informative title.
Task: In the Write window, paste the reprex you created above and add some informative text above it. You may also want to select some labels on the righthand side (e.g., “bug”, “help wanted”).
Tip: You can click on the Preview tab to check the formatting of your reprex.
Task: When you are ready, scroll down and click on the green Submit new issue button.
Now that you know how to ask for help, let’s consider some places where you can find help. One of the first things I often do is to simply use Google to search for my problem or error message. Sometimes you can find the answer rather quickly, but often you’ll be presented with an array of possible solutions that require you to read through the various questions and corresponding answers that others have posed. In the end, your desire for a quick resolution can be countered by the time you’ll spend combing through lots of extraneous information. Therefore, you might want to consider these other options.
Your friends, colleagues, advisor, and committee members can be a
great resource for help. Asking people you know can be much less
intimidating than engaging with anonymous strangers. Turning to your
officemate and asking them for help can be quick and also gratifying for
them if they were able to help. Many lab groups have Slack channels for
#coding
or #programming
, so consider asking
there as well.
The RStudio community provides a nice forum for people to ask questions specific to R, RStudio, and various packages (e.g., the “tidyverse”). The people tend to be compassionate and caring—they genuinely want to help rather than simply espouse their knowledge to whomever will listen.
In general, Twitter is a great source of information on R and data science. I follow lots of people who are developers and users of R and various packages (e.g., Hadley Wickham, Jenny Bryan, Mara Averick, Kara Woo, Thomas Lin Pedersen). Although it’s difficult to squeeze a reprex into 280 characters, you can share links to locations where you’ve posted a reprex (e.g., a GitHub Gist). Also consider checking in on the #rstats hashtag or including it in a post asking for help.
As you start working more with teams on projects that are hosted on GitHub, you will invariably run into bugs, problems, errors, etc that you can’t resolve yourself. Creating an issue that includes a reprex and description of the problem is a great way to solicit help, as well as maintain a record for future reference.
In some cases, you may encounter a problem or discover a bug in a particular R package. Although CRAN is the primary location for hosting pacakges, many developers also maintain GitHub repos for their packages. These developers typically ask people to file an issue with the stipulation that you be polite and follow some general guidelines. Issues along the lines of “this doesn’t work” or “why won’t you update this” have a pretty good chance of being ignored. Very few people are paid to develop and maintain packages, so treat their time and effort with respect.
If you tried searching for an answer to your problem, there’s a good chance you’ve come across some posts on Stack Overflow, or “SO” as it’s often referred. SO has keywords or tags that people can use to highlight the software or packages they’re using (e.g., “R”, “dplyr”), which allows you to filter results. SO also has a reputation of being an intimidating forum with often snarky comments on posts. However, if you’ve taken the time to create a good reprex, you shouldn’t encounter any problems.