# Understanding Bootstrap Confidence Interval Output from the R boot Package

Jeremy Albright

Posted on
Bootstrap Confidence Intervals

## Nuances of Bootstrapping

Most applied statisticians and data scientists understand that bootstrapping is a method that mimics repeated sampling by drawing some number of new samples (with replacement) from the original sample in order to perform inference. However, it can be difficult to understand output from the software that carries out the bootstrapping without a more nuanced understanding of how uncertainty is quantified from bootstrap samples.

To demonstrate the possible sources of confusion, start with the data described in Efron and Tibshirani’s (1993) text on bootstrapping (page 19). We have 15 paired observations of student LSAT scores and GPAs. We want to estimate the correlation between LSAT and GPA scores. The data are the following:

student lsat gpa
1 576 3.39
2 635 3.30
3 558 2.81
4 578 3.03
5 666 3.44
6 580 3.07
7 555 3.00
8 661 3.43
9 651 3.36
10 605 3.13
11 653 3.12
12 575 2.74
13 545 2.76
14 572 2.88
15 594 2.96

The correlation turns out to be 0.776. For reasons we’ll explore, we want to use the nonparametric bootstrap to get a confidence interval around our estimate of $$r$$. We do so using the boot package in R. This requires the following steps:

1. Define a function that returns the statistic we want.
2. Use the boot function to get R bootstrap replicates of the statistic.
3. Use the boot.ci function to get the confidence intervals.

For step 1, the following function is created:

get_r <- function(data, indices, x, y) {

d <- data[indices, ]
r <- round(as.numeric(cor(d[x], d[y])), 3)

r

}

Steps 2 and 3 are performed as follows:

set.seed(12345)

boot_out <- boot(
tbl,
x = "lsat",
y = "gpa",
R = 500,
statistic = get_r
)

boot.ci(boot_out)
## Warning in boot.ci(boot_out): bootstrap variances needed for studentized
## intervals
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 500 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot_out)
##
## Intervals :
## Level      Normal              Basic
## 95%   ( 0.5247,  1.0368 )   ( 0.5900,  1.0911 )
##
## Level     Percentile            BCa
## 95%   ( 0.4609,  0.9620 )   ( 0.3948,  0.9443 )
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable

Looking at the boot.ci output, the following questions come up:

1. Why are there multiple CIs? How are they calculated?
2. What are the bootstrap variances needed for studentized intervals?
3. What does it mean that the calculations and intervals are on the original scale?
4. Why are some BCa intervals unstable?

To understand this output, let’s review statistical inference, confidence intervals, and the bootstrap.

## Statistical Inference

The usual test statistic for determining if $$r \neq 0$$ is:

$t = \frac{r}{SE_r}$

where

$SE_r = \sqrt{\frac{1-r^2}{n-2}}$

In our case:

$SE_r = \sqrt{\frac{1-r^2}{n-2}} = \sqrt{\frac{1-0.776^2}{15-2}} = 0.175$

Dividing $$r$$ by $$SE_r$$ yields our $$t$$ statistic

$t = \frac{r}{SE_r} = \frac{0.776}{0.175} = 4.434$

We compare this to a $$t$$ distribution with $$n-2 = 13$$ degrees of freedom and easily find it to be significant.

In words: If the null hypothesis were true, and we repeatedly draw samples of size $$n$$, and we calculate $$r$$ each time, then the probability that we would observe an estimate of $$|r| = 0.776$$ or larger is less than 5%.

An important caveat. The above formula for the standard error is only correct when $$r = 0$$. The closer we get to $$\pm 1$$, the less correct it is.

## Confidence Intervals

We can see why the standard error formula above becomes less correct the further we get from zero by considering the 95% confidence interval for our estimate. The usual formula you see for a confidence interval is the estimate plus or minus the 97.5th percentile of the normal or $$t$$ distribution times the standard error. In this case, the $$t$$-based formula would be:

$\text{95% CI} = r \pm t_{df = 13} SE_r$

If we were to sample 15 students repeatedly from the population and calculate this confidence interval each time, the interval should include the true population value 95% of the time. So what happens if we use the standard formula for the confidence interval?

\begin{align} \text{95% CI} &= r \pm t_{df = 13}SE_r \\ &= 0.776 \pm 2.16\times 0.175 \\ &= [0.398, 1.154] \end{align}

Recall that correlations are bounded in the range $$[-1, +1]$$, but our 95% confidence interval contains values greater than one!

Alternatives:

• Use Fisher’s $$z$$-transformation. This is what your software will usually do, but it doesn’t work for most other statistics.
• Use the bootstrap. While not necessary for the correlation coefficient, its advantage is that it can be used for almost any statistic.

The next sections review the nonparametric and parametric bootstrap.

## Nonparametric Bootstrap

We do not know the true population distribution of LSAT and GPA scores. What we have instead is our sample. Just like we can use our sample mean as an estimate of the population mean, we can use our sample distribution as an estimate of the population distribution.

In the absence of supplementary information about the population (e.g. that it follows a specific distribution like bivariate normal), the empirical distribution from our sample contains as much information about the population distribution as we can get. If statistical inference is typically defined by repeated sampling from a population, and our sample provides a good estimate of the population distribution, we can conduct inferential tasks by repeatedly sampling from our sample.

(Nonparametric) bootstrapping thus works as follows for a sample of size N:

1. Draw a random sample of size N with replacement from our sample, which is the first bootstrap sample.
2. Estimate the statistic of interest using the bootstrap sample.
3. Draw a new random sample of size N with replacement, which is the second bootstrap sample.
4. Estimate the statistic of interest using the new bootstrap sample.
5. Repeat $$k$$ times.
6. Use the distribution of estimates across the $$k$$ bootstrap samples as the sampling distribution.

Note that the sampling is done with replacement. As an aside, most results from traditional statistics are based on the assumption of random sampling with replacement. Usually, the population we sample from is large enough that we do not bother noting the “with replacement” part. If the sample is large relative to the population, and sampling without replacement is used, we would typically be advised to use a finite population correction. This is just to say that the “with replacement” requirement is a standard part of the definition of random sampling.

Let’s take our data as an example. We will draw 500 bootstrap samples, each of size $$n = 15$$ chosen with replacement from our original data. The distribution across repeated samples is:

## Parametric Bootstrap

The prior section noted that, in the absence of supplementary information about the population, the empirical distribution from our sample contains as much information about the population distribution as we can get.

An example of supplementary information that may improve our estimates would be that we know the LSAT and GPA scores are distributed bivariate normal. If we are willing to make this assumption, we can use our sample to estimate the distribution parameters. Based on our sample, we find:

$\begin{pmatrix} \text{LSAT} \\ \text{GPA} \end{pmatrix}\sim N\left(\begin{pmatrix} 600.27 \\ 3.09 \end{pmatrix},\begin{pmatrix} 1746.78 & 7.90 \\ 7.90 & 0.06 \end{pmatrix}\right).$

The distribution looks like the following: