## Following Statistical Analysis Plan Guidelines

What is a Statistical Analysis Plan? A statistical analysis plan (SAP) is a document that is authored prior to the start of a clinical or observational study that presents significant detail about how data will be coded and analyzed. It serves three essential roles. First, it furnishes transparency concerning how the analysis will proceed by specifying in advance the methodology that will be applied. Second, it provides clear communication to the statistician involved in the study for how to proceed.

## The forcats Package and How to Use It

Intro This blog post will talk about the forcats package and how to use it to work with factor (categorical) variables. Although decent documentation exists, it can sometimes be a bit too terse for somebody first encountering the package’s functions. This blog post is meant to provide a fuller narrative description and examples of how to use the package. Amelia McNamara’s outstanding presentation from the 2019 RStudio conference is also worth the time to watch to understand the background, motivation, and usefulness of forcats.

## Data Pivoting with tidyr

Reshaping data from long to wide format, or wide to long format, is a common task in data science. Until recently, the best functions for performing this task in R were the gather and spread functions from the tidyr package. However, these functions had limitations, such as only being able to reshape one variable at a time, that required creative workarounds. The newest version of tidyr introduces the pivot_longer() and pivot_wider() functions that perform the same tasks, but that also handle a wider variety of use cases.

## Understanding Bootstrap Confidence Interval Output from the R boot Package

Nuances of Bootstrapping Most applied statisticians and data scientists understand that bootstrapping is a method that mimics repeated sampling by drawing some number of new samples (with replacement) from the original sample in order to perform inference. However, it can be difficult to understand output from the software that carries out the bootstrapping without a more nuanced understanding of how uncertainty is quantified from bootstrap samples. To demonstrate the possible sources of confusion, start with the data described in Efron and Tibshirani’s (1993) text on bootstrapping (page 19).

## Errors and Debugging in RStudio

Diagnosing and fixing errors in your code can be time-consuming and frustrating. There are two ways you can make your life easier. The first is knowing the tools at your disposal in RStudio to debug errors. RStudio provides a variety of tools to help you diagnose the problem at its source and come up with a solution as quick as possible. The second is knowing how to write functions that return clear yet detailed errors using condition handling.

## How to Do Mediation Scientifically

Mediation analysis has been around a long time, though its popularity has varied between disciplines and over the years. While some fields have been attracted to the potential of mediation models to identify pathways, or mechanisms, through which an independent variable affects an outcome, others have been skeptical that the analysis of mediated relationships can ever be done scientifically. Two developments, one more scientific than the other, have led to a renewed popularity of mediation analysis.

## Plotly for R - Multi-Layer Plots

If you are new to plotly, consider first reading our introductory post: Introduction to Interactive Graphics in R with plotly Often when analyzing data, it is necessary to produce a complex plot that requires multiple graphical layers. In plotly, multi-layer plots can be specified as a pipeline of data manipulations (dplyr only) and visual mappings. This is possible because dplyr verbs can be used on a plotly object to modify the underlying data.

## The Prisoner's Dilemma

Game Theory and Interdependent Outcomes Game theory is the study of interdependent decision making, or how individuals make decisions when their optimal choice depends on what others have chosen. Probably the best known application of game theory is the Prisoner’s Dilemma. In this game, there is a tension between the incentives faced by each player and the globally optimal outcome. In the parlance of game theory, Nash equilibrium is not Pareto optimal.