## Developing R Packages with usethis and GitLab CI: Part I

The best way to share your R code with others is to create a package. Whether you want to share your functions with team members, clients, or all interested R users, bundling up your functions into a package is the way to go. Luckily, there are great tools available that make this process relatively smooth and easy. This series of posts aims to walk through the process of setting up an R package and sharing it on the version control code repository, GitLab.

## A Tour of Timezones (& Troubles) in R

In any programming tool, dates, times, and timezones are hard. Deceptively hard. They’ve been shaped by politics and whimsy for hundreds of years: timezones can shift with minimal notice, countries have skipped or repeated certain days, some are offset by weird increments, some observe Daylight Saving Time, leap years, leap seconds, the list goes on. Luckily, we rarely need to worry about most of those details because other teams of very smart people have spent a lot of time providing nice abstractions for us that handle most of the weird edge cases.

## Be Aware of Bias in RF Variable Importance Metrics

Random forests are typically used as “black box” models for prediction, but they can return relative importance metrics associated with each feature in the model. These can be used to help interpretability and give a sense of which features are powering the predictions. Importance metrics can also assist in feature selection in high dimensional data. Careful attention should be paid to the data you are working with and when it is appropriate to use and interpret the different variable importance metrics from random forests.

## Bias Adjustment for Rare Events Logistic Regression in R

Rare events are often of interest in statistics and machine learning. Mortality caused by a prescription drug may be uncommon but of great concern to patients, providers, and manufacturers. Predictive models in finance may be focused on forecasting when equities move substantially, something quite rare relative to the more quotidian shifts in prices. Logistic-type models (logit models in econometrics, neural nets with sigmoidal activation functions) will tend to underestimate the probability of these events occurring.

## Highlights from rstudio::conf 2018

The second-annual rstudio::conf was held in San Diego at the end of January, bringing together a wide range of speakers, topics, and attendees. Covering all of it would require several people and a lot of space, but I’d like to highlight two broad topics that received a lot of coverage: new tools for shiny and enhanced modeling capabilities for R. Shiny Several speakers introduced a collection of new tools for enhancing the capabilities of Shiny developers: asynchronous processing, simplified functional testing, and load testing are all coming to the shiny world.
Logistic regression produces result that are typically interpreted in one of two ways: Predicted probabilities Odds ratios Odds are the ratio of the probability that something happens to the probabilty it doesn’t happen. $\Omega(X) = \frac{p(y=1|X)}{1-p(y=1|X)}$ An odds ratio is the ratio of two odds, each calculated at a different score for $$X$$. There are strengths and weaknesses to either choice. Predictored probabilities are intuitive, but require assuming a value for every covariate.