This blog post will talk about Stan and how to create Stan models in R using the rstan and rstanarm packages. Although Stan provides documentation for using its programming language and a user’s guide with examples, it can be difficult to follow for a beginner. Our hope is that this post provides a gentle introduction to Stan that helps you get started.
Stan Stan is a programming language for specifying statistical models.

What is a Statistical Analysis Plan? A statistical analysis plan (SAP) is a document that is authored prior to the start of a clinical or observational study that presents significant detail about how data will be coded and analyzed. It serves three essential roles. First, it furnishes transparency concerning how the analysis will proceed by specifying in advance the methodology that will be applied. Second, it provides clear communication to the statistician involved in the study for how to proceed.

Intro This blog post will talk about the forcats package and how to use it to work with factor (categorical) variables. Although decent documentation exists, it can sometimes be a bit too terse for somebody first encountering the package’s functions. This blog post is meant to provide a fuller narrative description and examples of how to use the package. Amelia McNamara’s outstanding presentation from the 2019 RStudio conference is also worth the time to watch to understand the background, motivation, and usefulness of forcats.

Reshaping data from long to wide format, or wide to long format, is a common task in data science. Until recently, the best functions for performing this task in R were the gather and spread functions from the tidyr package. However, these functions had limitations, such as only being able to reshape one variable at a time, that required creative workarounds. The newest version of tidyr introduces the pivot_longer() and pivot_wider() functions that perform the same tasks, but that also handle a wider variety of use cases.

Nuances of Bootstrapping Most applied statisticians and data scientists understand that bootstrapping is a method that mimics repeated sampling by drawing some number of new samples (with replacement) from the original sample in order to perform inference. However, it can be difficult to understand output from the software that carries out the bootstrapping without a more nuanced understanding of how uncertainty is quantified from bootstrap samples.
To demonstrate the possible sources of confusion, start with the data described in Efron and Tibshirani’s (1993) text on bootstrapping (page 19).

Diagnosing and fixing errors in your code can be time-consuming and frustrating. There are two ways you can make your life easier. The first is knowing the tools at your disposal in RStudio to debug errors. RStudio provides a variety of tools to help you diagnose the problem at its source and come up with a solution as quick as possible. The second is knowing how to write functions that return clear yet detailed errors using condition handling.

Mediation analysis has been around a long time, though its popularity has varied between disciplines and over the years. While some fields have been attracted to the potential of mediation models to identify pathways, or mechanisms, through which an independent variable affects an outcome, others have been skeptical that the analysis of mediated relationships can ever be done scientifically.
Two developments, one more scientific than the other, have led to a renewed popularity of mediation analysis.

If you are new to plotly, consider first reading our introductory post:
Introduction to Interactive Graphics in R with plotly
Often when analyzing data, it is necessary to produce a complex plot that requires multiple graphical layers. In plotly, multi-layer plots can be specified as a pipeline of data manipulations (dplyr only) and visual mappings. This is possible because dplyr verbs can be used on a plotly object to modify the underlying data.

Game Theory and Interdependent Outcomes Game theory is the study of interdependent decision making, or how individuals make decisions when their optimal choice depends on what others have chosen. Probably the best known application of game theory is the Prisoner’s Dilemma. In this game, there is a tension between the incentives faced by each player and the globally optimal outcome. In the parlance of game theory, Nash equilibrium is not Pareto optimal.

R users adore the ggplot2 package for all things data visualization. Its consistent syntax, useful defaults, and flexibility make it a fantastic tool for creating high-quality figures. Although ggplot2 is great, there are other dataviz tools that deserve a place in a data scientist’s toolbox. Enter plotly.
plotly is a high-level interface to plotly.js, based on d3.js which provides an easy-to-use UI to generate slick D3 interactive graphics.