# Using the usethis Package and GitLab CI for Package Development in R: Part I

Caleb Scheidel

Posted on
rstats package development

The best way to share your R code with others is to create a package. Whether you want to share your functions with team members, clients, or all interested R users, bundling up your functions into a package is the way to go. Luckily, there are great tools available that make this process relatively smooth and easy. This series of posts aims to walk through the process of setting up an R package and sharing it on the version control code repository, GitLab. This first post will focus solely on building an R package with usethis. The following posts will go into the details of sharing the package on GitLab and taking advantage of it’s built-in continuous integration services to automate testing of the package.

## Setting up with usethis

Suppose we have written a function that calculates a p-value from either a Chi-squared or a Fisher Exact test, depending on if a warning is thrown from the Chi-squared test due to small expected counts. We think this is a pretty useful function, so we would like to make it available for others to use. Let’s make a package for it. We’ll name the package chifishr.

To help with the setup, we will utilize the usethis package. usethis was spun out of the devtools package, and was created specifically to automate the tasks required to setup the common components of R packages. It takes care of getting the infrastructure of the package in place, so you can focus your efforts on creating your functions, examples and tests.

Open RStudio and run usethis::create_package, usethis::use_package_doc, and usethis::use_roxygen_md to get the bare-bones structure and documentation of the package in place.

install.packages("usethis")

usethis::create_package("~/gitlab/chifishr")
#> Changing active project to chifishr
#> ✔ Creating 'R/'
#> ✔ Creating 'man/'
#> ✔ Writing 'DESCRIPTION'
#> ✔ Writing 'NAMESPACE'
#> ✔ Writing 'chifishr.Rproj'
#> ✔ Adding '.Rproj.user' to './.gitignore'
#> ✔ Adding '^chifishr\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore'
#> ✔ Opening project in RStudio

usethis::use_package_doc()
#> ✔ Writing 'R/chifishr-package.R'

usethis::use_roxygen_md()
#> ✔ Setting Roxygen field in DESCRIPTION to 'list(markdown = TRUE)'
#> ✔ Setting RoxygenNote field in DESCRIPTION to '6.0.1'
#> ● Re-document

Edit the DESCRIPTION file to add details about the package, including the title, description, author and R version dependency. To open the file for editing, run usethis:::edit_file("DESCRIPTION").

Package: chifishr
Version: 0.0.0.9000
Title: Helpers for Calculating Chi-squared and Fisher Exact Test p-values
Description: This package contains helper functions for calculating p-values from Chi-squared or Fisher exact test, depending on if a warning is thrown from the Chi-Squared test due to small expected counts leading to poor p-value approximations.
Authors@R: person("Caleb", "Scheidel", , "caleb@methodsconsultants.com", c("aut", "cre"))
Encoding: UTF-8
LazyData: true
ByteCompile: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.0.1
Depends:
R (>= 2.10)

It is good practice to add a software license for the package if it is planning on being shared. The MIT open source license is simple and permissive, and is a very commonly used license for R packages. Let’s use that here.

usethis::use_mit_license("Caleb Scheidel")
#> ✔ Setting License field in DESCRIPTION to 'MIT + file LICENSE'
#> ✔ Adding '^LICENSE\\.md$' to '.Rbuildignore' #> ✔ Writing 'LICENSE' The package we are creating will depend on the dplyr and purrr packages. We’ll need to add these dependencies to the DESCRIPTION file. This can easily be done with the usethis::use_package function. usethis::use_package("dplyr") #> ✔ Adding 'dplyr' to Imports field in DESCRIPTION #> ● Refer to functions with dplyr::fun() usethis::use_package("purrr") #> ✔ Adding 'purrr' to Imports field in DESCRIPTION #> ● Refer to functions with purrr::fun() We will also utilize the pipe (%>%) function from the magrittr package. We can add this dependency with usethis::use_pipe. usethis::use_pipe() #> ✔ Adding 'magrittr' to Imports field in DESCRIPTION #> ✔ Writing 'R/utils-pipe.R' #> ● Run document() #### Adding a function Now we can add our function to the chi_fisher_p.R script in the R/ directory. First create the script with usethis::use_r. usethis::use_r("chi_fisher_p") Then add the function to that file, along with the necessary roxygen documentation. roxygen generates .Rd documentation files, which give users of the package the ability to view the arguments and returned value of the functions, among other details. # chi_fisher_p.R #' Function which calculates p-value via Chi-square or Fisher exact test. #' #' @param tbl (tbl) Dataframe that has variable and treatment columns of interest #' @param var (character) Name of variable column #' @param treatment (character) Name of treatment column #' #' @return (numeric) p-value #' #' @examples #' #' chi_fisher_p(treatment, "outcome", "treatment") #' chi_fisher_p(treatment, "gender", "treatment") #' #' @export chi_fisher_p <- function(tbl, var, treatment) { chisq_wrapper <- function(tbl, var, treatment) { var <- tbl %>% dplyr::pull(var) %>% as.factor() treatment <- tbl %>% dplyr::pull(treatment) %>% as.factor() p <- stats::chisq.test(var, treatment)$p.value
return(p)
}

fisher_wrapper <- function(tbl, var, treatment) {

var       <- tbl %>% dplyr::pull(var) %>% as.factor()
treatment <- tbl %>% dplyr::pull(treatment) %>% as.factor()

p <- stats::fisher.test(var, treatment)$p.value return(p) } chisq_wrapper <- purrr::quietly(chisq_wrapper) chisq <- chisq_wrapper(tbl, var, treatment) if (length(chisq$warnings) == 0) {
return(chisq$result) } else { return(fisher_wrapper(tbl, var, treatment)) } } #### Adding test data To test this function, we will create a fake data set. The data set will have 100 observations and 3 variables: treatment, gender, and outcome. The suggested practice is to include the data generating scripts in the package repository. To help set this up, run usethis::use_data_raw(). usethis::use_data_raw() #> ✔ Creating 'data-raw/' #> ✔ Adding '^data-raw$' to '.Rbuildignore'
#> Next:
#> ● Add data creation scripts in 'data-raw'
#> ● Use usethis::use_data() to add data to package

Then create the R script that will generate the data, run it locally and add it to /data-raw.

# treatment-data.R

treatment <- tibble::tibble(
treatment = c(rep("old", 50), rep("new", 50)),
gender    = c(rep("male", 30), rep("female", 20), rep("male", 20), rep("female", 30)),
outcome   = c(rep("failure", 95), rep("success", 5))
)

Note that the outcome is rare (5% success). If outcome is used as a variable in chisq.test, a warning will result. To include this data set in the package, we can run usethis::use_data().

usethis::use_data(treatment)
#> ✔ Creating 'data/'
#> ✔ Saving treatment to data/treatment.rda

Since this data will be accessible to users of the package, it must be documented. To do this, we will document the name of the data set and save it in the R/data.R script.

First create the script.

usethis::use_r("data")
#> ● Modify 'data.R'

Then add the documentation for the treatment data set to that script.

#' Outcomes of 100 patients by old and new treatments
#'
#' A dataset containing the genders and outcomes of two
#' treatment groups of 100 patients.
#'
#' @format A data frame with 100 rows and 3 variables:
#'  - *treatment*: treatment, old or new
#'  - *treatment*: gender, male or female
#'  - *outcome*: outcome, failure or success
"treatment"

To ensure the function and the data set we just created have the proper .Rd documentation files within the package, run devtools::document().

devtools::document()
#> Updating chifishr documentation
#> Writing NAMESPACE
#> Writing chi_fisher_p.Rd
#> Writing chifishr-package.Rd
#> Writing treatment.Rd
#> Writing pipe.Rd

The function we just created needs to be tested to ensure that it is performing how we are expecting. To set up the file structure for writing and executing tests, run usethis::use_testthat. testthat is an extremely helpful toolset for setting up and running tests within a package.

usethis::use_testthat()
#> ✔ Adding 'testthat' to Suggests field in DESCRIPTION
#> ✔ Creating 'tests/testthat/'
#> ✔ Writing 'tests/testthat.R'

Now we can add some tests to the tests/testthat/ directory. If you have the chi_fisher_p.R script open in RStudio and run usethis::use_test(), it will create a test file corresponding to that script that you can put the related tests in.

usethis::use_test()
#> ✔ Writing 'tests/testthat/test-chi_fisher_p.R'
#> ● Modify 'test-chi_fisher_p.R'

Using known outcomes from chisq.test in our example treatment data, we can then write tests to check that chi_fisher_p returns a Chi-squared p-value when a warning is thrown from chisq.test, and returns a Fisher exact test otherwise. This can be done using the expect_ family of functions from testthat.

# test-chi_fisher_p.R

context("test-chi_fisher_p.R")

test_that("returns chi-squared p value if no warnings are thrown", {
expect_silent(chisq.test(treatment$gender, treatment$treatment))
expect_equal(chi_fisher_p(treatment, "gender", "treatment"), chisq.test(treatment$gender, treatment$treatment)$p.value) }) test_that("returns fisher p value if chi-squared warnings are thrown", { expect_warning(chisq.test(treatment$outcome, treatment$treatment)) expect_equal(chi_fisher_p(treatment, "outcome", "treatment"), fisher.test(treatment$outcome, treatment$treatment)$p.value)
})

We know these tests will pass right now, but the tests are important to make sure any changes made to the package in the future do not break the basic functionality of chi_fisher_p. We can run these tests with devtools::test(), using the keyboard shortcut Cmd+Shift+T (Mac) or Ctrl+Shift+T (Windows/Linux), or using the RStudio Test button in the “Build” pane:

## Checking the package

Now that the first version of the package is nearly complete, we will want to “check” the package for any missing documentation or errors in file structures, as well as run the tests for the function. We can do all of this by running devtools::check(). Alternatively, you could use the keyboard shortcut Cmd+Shift+E (Mac) or Ctrl+Shift+E (Windows/Linux) or use the RStudio Check button in the “Build” pane:

## Up Next

Part II will demonstrate how to share the package on GitLab, as well as setup automated checking and testing with GitLab’s built-in CI services.