Motivation

There is a good chance that you’ve already had to ask someone for help with a question related to some aspect of data science. For example,

  • What does this error mean?

  • Why can I not reproduce what you’ve done?

  • How do I [fill in the blank]?

  • Is there an existing package that allows me to [fill in the blank]?

The bad news is that these types of questions may follow you around throughout your career. The good news, however, is that there is a large community of people willing to help you find the solutions you need.


Minimal, reproducible example

When you get stuck on a problem and need some advice, others will be able to better provide assistance if you include information that

  1. they can easily understand, and

  2. they can use to reproduce the problem.

This process is variously referred to as creating a

  • minimal, reproducible example (reprex)

  • minimal, complete and verifiable example (mcve)

  • minimal, workable example (mwe)

The bottom line is that you should

  • Use the least amount of code possible to produce the problem (minimal)

  • Provide everything (code, data) someone would need to reproduce your problem (complete)

  • Verify that the code you’re about to share reproduces the problem (reproducible)

In addition, you should phrase your question in manner that is polite and free of jargon or other language constructs that might confuse people. The data science community is global, so recognize that someone reading your question might not speak English as a first language. Also consider that others may very well stumble across your question, and hopefully the correct answer, years after it was solved. As such, using plain, descriptive language in the title and body of a post will allow others to find it when searching for answers. For example, this problem I posted on Stack Overflow in 2016 has since received over 2000 views!

Minimal

There tends to be a strong negative correlation between the length of your code and the likelihood that someone will 1) read all the way through it, and 2) actually be able to solve your problem. Therefore, you should consider the following points:

  • Consider creating a new script, including only those elements germane to the problem;

  • Remove extraneous information that doesn’t influence the problematic code;

  • Use simple, descriptive names for parameters, variables, and functions;

  • Include meaningful comments in your code to help others understand what you’re trying to accomplish;

  • Take advantage of your IDE’s options to create nicely formatted code.

  Tip: The following commands in RStudio can be helpful for formatting code:
  • indent lines: command + I on a Mac or ctrl + I in Windows
  • format code: command + shift + A on a Mac or ctrl + shift + A in Windows

Complete

It’s important to include all of the information necessary to reproduce the problem within the question itself. Don’t put other people in a position where their first response to you is a request for more information. In addition, please consider the following advice:

  • If there are data involved, include a small subset of the data along with any necessary code;

  • Break your problem down into small code blocks and include a description of each block’s purpose;

  • It’s OK to include an image or figure in your question, but

  • Do not use images of code; rather, copy/paste the actual code so others don’t have to re-type anything;

  • Include information or links to other problems/solutions that you’ve tried or are referencing.

Reproducible

In order to help you, other people will need to verify that they can indeed reproduce the problem you’re having. Phrases like “Help!”, “This won’t work!”, or “Why does this happen?” don’t offer much information. Instead, use descriptive language that gets to the question at hand, including:

  • Explain what the expected result should be;

  • Provide the exact wording of any error messages;

  • Indicate which line(s) of code produce(s) the problem;

  • Use a brief, descriptive summary of your problem as the title of your question.

Tip: You should double-check that your example reproduces the problem. It’s good practice to quit RStudio (or other software) and restart it from scratch. If possible, consider testing your example on another computer or operating system.

Metainfo

In addition to minimal, complete, and reproducible code (and possibly data), you should also include information about your larger work environment. This includes:

  • Your operating system (Mac, Windows, Linux) and its version;

  • Versions of your software (R, packages); sessionInfo() and packageVersion("package-name-here") are both helpful functions.


Creating a “reprex”

Creating a reproducible example (“reprex”) can take some work, but so does answering other people’s questions. As Jenny Bryan says, “Help me help you.” Here are some options for helping to create a reprex.

Using dput()

Copying/pasting code is relatively easy, but trying to include any data necessary to create a complete reprex can be tricky. In general, including a .csv file or other data format is difficult without pointing someone to an external source, link, etc.

Tip: The dput() function provides a convenient way of writing all of the information in a data frame in a compact manner.

Here’s an example of using dput() on the Palmer penguins data set. This data set is actually quite large, so we’ll only use the first 10 rows of the data frame.

## load the library
library(palmerpenguins)

## get the data & inspect them
data(package = 'palmerpenguins')
head(penguins)
## # A tibble: 6 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## 6 Adelie  Torge…           39.3          20.6              190        3650 male 
## # … with 1 more variable: year <int>
## write the first 10 rows of data
dput(penguins[1:10,])
## structure(list(species = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L), .Label = c("Adelie", "Chinstrap", "Gentoo"), class = "factor"), 
##     island = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
##     3L), .Label = c("Biscoe", "Dream", "Torgersen"), class = "factor"), 
##     bill_length_mm = c(39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 
##     39.2, 34.1, 42), bill_depth_mm = c(18.7, 17.4, 18, NA, 19.3, 
##     20.6, 17.8, 19.6, 18.1, 20.2), flipper_length_mm = c(181L, 
##     186L, 195L, NA, 193L, 190L, 181L, 195L, 193L, 190L), body_mass_g = c(3750L, 
##     3800L, 3250L, NA, 3450L, 3650L, 3625L, 4675L, 3475L, 4250L
##     ), sex = structure(c(2L, 1L, 1L, NA, 1L, 2L, 1L, 2L, NA, 
##     NA), .Label = c("female", "male"), class = "factor"), year = c(2007L, 
##     2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L
##     )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
## ))

Note: The output from dput() may look a bit strange, but it’s R’s way of storing the information contained in a data frame.

Tip: You can copy/paste the output above into a post with your code and someone could simply assign it to an object and it will render just fine.

## assign output to an object
dat <- structure(list(species = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("Adelie", "Chinstrap", "Gentoo"), class = "factor"), 
    island = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L), .Label = c("Biscoe", "Dream", "Torgersen"), class = "factor"), 
    bill_length_mm = c(39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 
    39.2, 34.1, 42), bill_depth_mm = c(18.7, 17.4, 18, NA, 19.3, 
    20.6, 17.8, 19.6, 18.1, 20.2), flipper_length_mm = c(181L, 
    186L, 195L, NA, 193L, 190L, 181L, 195L, 193L, 190L), body_mass_g = c(3750L, 
    3800L, 3250L, NA, 3450L, 3650L, 3625L, 4675L, 3475L, 4250L
    ), sex = structure(c(2L, 1L, 1L, NA, 1L, 2L, 1L, 2L, NA, 
    NA), .Label = c("female", "male"), class = "factor"), year = c(2007L, 
    2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L
    )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

## inspect the object
dat
## # A tibble: 10 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # … with 2 more variables: sex <fct>, year <int>


The {reprex} package

The process of creating a reprex can also introduce unintended mistakes in your code when copying/pasting or retyping. As an instructor at UBC, Jenny Bryan found herself responding to hundreds of issues and questions per semester, so she (and others) created the {reprex} package.

{reprex} from the command line

Task: Begin by loading the reprex package.

library(reprex)

Task: Select the code you’re interested in and copy it to the clipboard.

a <- 1
b <- 0
a / b

Task: Type reprex() at the R command prompt. It will think for a moment and then respond with a message about the reprex output.

> reprex()
✖ Install the styler package in order to use `style = TRUE`.
ℹ Rendering reprex...
✔ Reprex output is on the clipboard.

Tip: If you’re using RStudio, you’ll see a preview of your rendered reprex in the Viewer pane.

Tip: Your code, its result, and some additional information about the date and reprex version are all now sitting on your clipboard and available for pasting somewhere for help (see below).

Note: In the background, reprex produces this chunk of code, which will render nicely when pasted into a markdown document:

```{r}
a <- 1
b <- 0
a / b
#> [1] Inf
```

Tip: There are a number of additional options you can pass to reprex() that will format the output for a particular forum for help (e.g., venue = gh for GitHub) or to also include the session information (e.g., session_info = TRUE).

{reprex} in RStudio

{reprex} is also designed to work seamlessly within RStudio and will do most of the work for you. To do so, you can access its functionality via the Addins pulldown menu.

Tip: Click here to learn more about RStudio addins.

Task: After you’ve loaded the {reprex} package, click on the Addins pulldown menu in the code pane and type “reprex” in the search bar. Select Render reprex… from the menu.


Note: You should see up a popup window with choices about where the code is located, the target venue for help, and whether you want to append the session information.

Task: Make your desired selections and click on the blue Render button in the upper right, which will copy all of the information to your clipboard and also display in the Viewer pane.


Note: At this point you have several options for seeking answers to your question or problem. The following example shows an option for creating an Issue in GitHub, but there are lots of others (see next section).

Task: Navigate to a GitHub repo (perhaps the one where your coding project lives) and create a new issue with an informative title.


Task: In the Write window, paste the reprex you created above and add some informative text above it. You may also want to select some labels on the righthand side (e.g., “bug”, “help wanted”).


Tip: You can click on the Preview tab to check the formatting of your reprex.

Task: When you are ready, scroll down and click on the green Submit new issue button.



Where to get help

Now that you know how to ask for help, let’s consider some places where you can find help. One of the first things I often do is to simply use Google to search for my problem or error message. Sometimes you can find the answer rather quickly, but often you’ll be presented with an array of possible solutions that require you to read through the various questions and corresponding answers that others have posed. In the end, your desire for a quick resolution can be countered by the time you’ll spend combing through lots of extraneous information. Therefore, you might want to consider these other options.

Colleagues

Your friends, colleagues, advisor, and committee members can be a great resource for help. Asking people you know can be much less intimidating than engaging with anonymous strangers. Turning to your officemate and asking them for help can be quick and also gratifying for them if they were able to help. Many lab groups have Slack channels for #coding or #programming, so consider asking there as well.

RStudio community

The RStudio community provides a nice forum for people to ask questions specific to R, RStudio, and various packages (e.g., the “tidyverse”). The people tend to be compassionate and caring—they genuinely want to help rather than simply espouse their knowledge to whomever will listen.

Twitter

In general, Twitter is a great source of information on R and data science. I follow lots of people who are developers and users of R and various packages (e.g., Hadley Wickham, Jenny Bryan, Mara Averick, Kara Woo, Thomas Lin Pedersen). Although it’s difficult to squeeze a reprex into 280 characters, you can share links to locations where you’ve posted a reprex (e.g., a GitHub Gist). Also consider checking in on the #rstats hashtag or including it in a post asking for help.

GitHub

As you start working more with teams on projects that are hosted on GitHub, you will invariably run into bugs, problems, errors, etc that you can’t resolve yourself. Creating an issue that includes a reprex and description of the problem is a great way to solicit help, as well as maintain a record for future reference.

In some cases, you may encounter a problem or discover a bug in a particular R package. Although CRAN is the primary location for hosting pacakges, many developers also maintain GitHub repos for their packages. These developers typically ask people to file an issue with the stipulation that you be polite and follow some general guidelines. Issues along the lines of “this doesn’t work” or “why won’t you update this” have a pretty good chance of being ignored. Very few people are paid to develop and maintain packages, so treat their time and effort with respect.

Stack Overflow

If you tried searching for an answer to your problem, there’s a good chance you’ve come across some posts on Stack Overflow, or “SO” as it’s often referred. SO has keywords or tags that people can use to highlight the software or packages they’re using (e.g., “R”, “dplyr”), which allows you to filter results. SO also has a reputation of being an intimidating forum with often snarky comments on posts. However, if you’ve taken the time to create a good reprex, you shouldn’t encounter any problems.