## Bias Adjustment for Rare Events Logistic Regression in R

Rare events are often of interest in statistics and machine learning. Mortality caused by a prescription drug may be uncommon but of great concern to patients, providers, and manufacturers. Predictive models in finance may be focused on forecasting when equities move substantially, something quite rare relative to the more quotidian shifts in prices. Logistic-type models (logit models in econometrics, neural nets with sigmoidal activation functions) will tend to underestimate the probability of these events occurring.
Logistic regression produces result that are typically interpreted in one of two ways: Predicted probabilities Odds ratios Odds are the ratio of the probability that something happens to the probabilty it doesn’t happen. $\Omega(X) = \frac{p(y=1|X)}{1-p(y=1|X)}$ An odds ratio is the ratio of two odds, each calculated at a different score for $$X$$. There are strengths and weaknesses to either choice. Predictored probabilities are intuitive, but require assuming a value for every covariate.