Errors and Debugging in RStudio

Caleb Scheidel

Posted on
rstats errors and testing

Diagnosing and fixing errors in your code can be time-consuming and frustrating. There are two ways you can make your life easier. The first is knowing the tools at your disposal in RStudio to debug errors. RStudio provides a variety of tools to help you diagnose the problem at its source and come up with a solution as quick as possible. The second is knowing how to write functions that return clear yet detailed errors using condition handling. This post will walk through both of these topics so that you can become better at handling errors when writing your own code as well as working with errors in someone else’s code.

Debugging Errors

The following general strategy can be applied to debug an error, as outlined by Hadley Wickham in his Advanced R book:

• Many times it is a common error with a known solution
2. Make it repeatable
• Create a minimal, reproducible example (e.g. reprex) using simple data
• Note which inputs don’t trigger the error
• If not already done, write simple tests to reduce chances of creating a new bug
3. Figure out where the error is
• Use the “scientific method”
• Hypothesize, test with experiments, and record results
• If needed, ask someone else for a second pair of eyes to review
4. Fix it and test it

These four steps should be followed each time you encounter an unexpected error in a function. Many times, you may not even know what line of code the error is coming from. How can you determine where the code is not behaving? You can follow these general steps to answer this question:

1. Begin running the code.
2. Stop the code where you suspect the bug/problem is arising.
3. Look and/or walk through the code, step-by-step at that point.

This can be done ad-hoc in a separate R script containing the function code, or using several built-in tools in RStudio, including the traceback function and debug mode.

Let’s look at an example function to demonstrate the use of these tools. We’ll create a simple data set with three binary variables, treatment, gender, and outcome. The chifishr::chi_fisher_p function is a simple function that calculates a p-value from either a Chi-squared or Fisher Exact test, depending on if a warning is thrown from the Chi-squared test due to small expected counts leading to poor p-value approximations.

treatment <- tibble::tibble(
treatment = c(rep("old", 50), rep("new", 50)),
gender    = c(rep("male", 30), rep("female", 20),
rep("male", 20), rep("female", 30)),
outcome   = c(rep("failure", 95), rep("success", 5))
)

# devtools::install_git("https://gitlab.com/scheidec/chifishr")
library(chifishr)

# warning is present, Fisher p-value is returned
chi_fisher_p(treatment, "outcome", "treatment")
## [1] 0.05628449
# no warning is present, Chi-squared p-value returned
chi_fisher_p(treatment, "gender", "treatment")
## [1] 0.07186064

Let’s take a closer look at the code within the chi_fisher_p function to see what is happening:

chi_fisher_p
## function (tbl, var, treatment)
## {
##     chisq_wrapper <- function(tbl, var, treatment) {
##         var <- tbl %>% dplyr::pull(var) %>% as.factor()
##         treatment <- tbl %>% dplyr::pull(treatment) %>% as.factor()
##         p <- stats::chisq.test(var, treatment)$p.value ## return(p) ## } ## fisher_wrapper <- function(tbl, var, treatment) { ## var <- tbl %>% dplyr::pull(var) %>% as.factor() ## treatment <- tbl %>% dplyr::pull(treatment) %>% as.factor() ## p <- stats::fisher.test(var, treatment)$p.value
##         return(p)
##     }
##     chisq_wrapper <- purrr::quietly(chisq_wrapper)
##     chisq <- chisq_wrapper(tbl, var, treatment)
##     if (length(chisq$warnings) == 0) { ## return(chisq$result)
##     }
##     else {
##         return(fisher_wrapper(tbl, var, treatment))
##     }
## }
## <bytecode: 0x00000000145be0f0>
## <environment: namespace:chifishr>

First, there are two internal functions defined, chisq_wrapper and fisher_wrapper. These functions pull and store the specified variables from the input tbl as vectors. The chisq.test and fisher.test functions, respectively, are then performed on those vectors and only the numeric p.value result is returned.

The next line wraps the chisq_wrapper function in purrr::quietly, which captures the side effects of a function. Now, when chisq_wrapper is called, it will return a list with components result, output, messages and warnings. This allows the function to check if a warning is present when the Chi-squared test is performed, and return either the Chi-squared test p-value or the Fisher Exact test p-value from the subsequent if-else block.

If we pass in a variable that is not in the input data set, we would expect an error to be thrown: