Rare events are often of interest in statistics and machine learning. Mortality caused by a prescription drug may be uncommon but of great concern to patients, providers, and manufacturers. Predictive models in finance may be focused on forecasting when equities move substantially, something quite rare relative to the more quotidian shifts in prices. Logistic-type models (logit models in econometrics, neural nets with sigmoidal activation functions) will tend to underestimate the probability of these events occurring.

Logistic regression produces result that are typically interpreted in one of two ways:
Predicted probabilities Odds ratios Odds are the ratio of the probability that something happens to the probabilty it doesn’t happen.
\[ \Omega(X) = \frac{p(y=1|X)}{1-p(y=1|X)} \] An odds ratio is the ratio of two odds, each calculated at a different score for \(X\).
There are strengths and weaknesses to either choice.
Predictored probabilities are intuitive, but require assuming a value for every covariate.

The purpose of this blog post is to review the derivation of the logit estimator and the interpretation of model estimates. Logit models are commonly used in statistics to test hypotheses related to binary outcomes, and the logistic classifier is commonly used as a pedagogic tool in machine learning courses as a jumping off point for developing more sophisticated predictive models. A secondary goal is to clarify some of the terminology related to logistic models, which - as should already be clear given the interchanging usage of “logit” and “logistic” - may be confusing.