Rstats

Data Pivoting with tidyr

Reshaping data from long to wide format, or wide to long format, is a common task in data science. Until recently, the best functions for performing this task in R were the gather and spread functions from the tidyr package. However, these functions had limitations, such as only being able to reshape one variable at a time, that required creative workarounds. The newest version of tidyr introduces the pivot_longer() and pivot_wider() functions that perform the same tasks, but that also handle a wider variety of use cases.
Read more

Understanding Bootstrap Confidence Interval Output from the R boot Package

Nuances of Bootstrapping Most applied statisticians and data scientists understand that bootstrapping is a method that mimics repeated sampling by drawing some number of new samples (with replacement) from the original sample in order to perform inference. However, it can be difficult to understand output from the software that carries out the bootstrapping without a more nuanced understanding of how uncertainty is quantified from bootstrap samples. To demonstrate the possible sources of confusion, start with the data described in Efron and Tibshirani’s (1993) text on bootstrapping (page 19).
Read more

Errors and Debugging in RStudio

Diagnosing and fixing errors in your code can be time-consuming and frustrating. There are two ways you can make your life easier. The first is knowing the tools at your disposal in RStudio to debug errors. RStudio provides a variety of tools to help you diagnose the problem at its source and come up with a solution as quick as possible. The second is knowing how to write functions that return clear yet detailed errors using condition handling.
Read more

How to Do Mediation Scientifically

Mediation analysis has been around a long time, though its popularity has varied between disciplines and over the years. While some fields have been attracted to the potential of mediation models to identify pathways, or mechanisms, through which an independent variable affects an outcome, others have been skeptical that the analysis of mediated relationships can ever be done scientifically. Two developments, one more scientific than the other, have led to a renewed popularity of mediation analysis.
Read more

Plotly for R - Multi-Layer Plots

If you are new to plotly, consider first reading our introductory post: Introduction to Interactive Graphics in R with plotly Often when analyzing data, it is necessary to produce a complex plot that requires multiple graphical layers. In plotly, multi-layer plots can be specified as a pipeline of data manipulations (dplyr only) and visual mappings. This is possible because dplyr verbs can be used on a plotly object to modify the underlying data.
Read more

Introduction to Interactive Graphics in R with plotly

R users adore the ggplot2 package for all things data visualization. Its consistent syntax, useful defaults, and flexibility make it a fantastic tool for creating high-quality figures. Although ggplot2 is great, there are other dataviz tools that deserve a place in a data scientist’s toolbox. Enter plotly. plotly is a high-level interface to plotly.js, based on d3.js which provides an easy-to-use UI to generate slick D3 interactive graphics.
Read more

Assessing Causality from Observational Data using Pearl's Structural Causal Models

Causality In 20th century statistics classes, it was common to hear the statement: “You can never prove causality.” As a result, researchers published results saying “x is associated with y” as a way of circumventing the issue of causality yet implicitly suggesting that the association is causal. As an example from my former discipline, political science, there was an interest in determining how representative democracy works. Do politicians respond to voters, or do voters just update their policy beliefs to line up with the party they’ve always preferred?
Read more

Developing R Packages with usethis and GitLab CI: Part III

While developing your R package, you will want to make sure the code it contains is as clean as possible and that your package build and testing times are as efficient as you can make them. There are a number of tricks and tools at your disposal to accomplish these aims. This post, the third in a series that covers R package development, will introduce a few of those and demonstrate how they can improve your package development process.
Read more

Tracking private R dependencies with packrat & git submodules

Here at Methods we often use RStudio’s packrat package to version our package dependencies and help ensure our work is reproducible. Packrat handles public packages on CRAN or Github just fine, but we have a lot of internal packages hosted privately on Gitlab that we’d like to have packrat manage like the rest of our dependencies. This comes up very naturally for us, as we often make client-specific R packages we then want to use in other work for that client.
Read more

Developing R Packages with usethis and GitLab CI: Part II

This post, the second part in a series that covers R package development, will define the important concept of continuous integration (CI) and demonstrate the advantages of using CI within GitLab. The version control code repository, GitLab, offers many services to its users, including the ability to set up CI services to R programmers and software developers in private repositories for free. GitLab’s built-in CI service is easy to utilize and can be set up with an R package relatively quickly.
Read more