Quantcast
Viewing all articles
Browse latest Browse all 54

Bootstrap Confidence Intervals

(This article was first published on Statistical Research » R, and kindly contributed to R-bloggers)

Here is an example of nonparametric bootstrapping.  It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ranging from variance estimation to model selection. John Tukey, as the story goes, suggested the name “the shotgun” because you can blow the head off any statistical problem.

The code below is for illustrative purposes and compares a couple of different approaches for bootstrapping. The mean shows a very nice distribution but something like a median is not so symmetrical  The code below can easily be easily by changed to allow for any single statistic (e.g. any percentile). A little bit of alteration and bivariate statistics (e.g. correlation) can be bootstrapped.  One can observe that it is quite simple to obtain the confidence interval directly.  By using nboot=10000 (or any other number that can easily be divided) it makes it quite simple to find the confidence interval by merely taking the alpha/2 and (1-alpha/2) percentiles; in this case below the 50 and 9950 positions.

library(boot)
nboot <- 10000 # Number of simulations
alpha <- .01 # alpha level
n <- 1000 # sample size
bootThetaQuantile <- function(x,i) {
quantile(x[i], probs=.5)
}

bootThetaMean <- function(x,i) {
mean(x[i])
}

raw <- rnorm(n,0, 1) # raw data
( theta.boot.median <- boot(raw, bootThetaQuantile, R=nboot) )
boot.ci(theta.boot.median, conf=(1-alpha))

( theta.boot.mean <- boot(raw, bootThetaMean, R=nboot) )
boot.ci(theta.boot.mean, conf=(1-alpha))

my.replicate <- replicate(nboot, raw[sample(1:length(raw), n, replace=TRUE)])

# Bootstrap
theta.median <- apply(my.replicate, 2, bootThetaQuantile)
theta.mean <- apply(my.replicate, 2, bootThetaMean)

hist(theta.median, xlim=c(-.2,.2), nclass=50, col=3, main="Histogram of Bootstrap Confidence Intervals for Median")
hist(theta.mean, xlim=c(-.2,.2), nclass=50, col=3, main="Histogram of Bootstrap Confidence Intervals for Mean")

sort(theta.median)[nboot*alpha/2]
sort(theta.median)[nboot*(1-alpha/2)]

sort(theta.mean)[nboot*alpha/2]
sort(theta.mean)[nboot*(1-alpha/2)]

### Randomly generated data
my.replicate <- replicate(nboot, rnorm(n,0,1))

theta.rand.median <- apply(my.replicate, 2, bootThetaQuantile)
theta.rand.mean <- apply(my.replicate, 2, bootThetaMean)

ci.u <- mean(theta.rand.mean)+qnorm(1-alpha/2)*sd(raw)/sqrt(n)
ci.l <- mean(theta.rand.mean)-qnorm(1-alpha/2)*sd(raw)/sqrt(n)

hist(theta.rand.median, xlim=c(-.2,.2), nclass=100, col=3, main="Histogram of Randomly Generated Data for Medians")

hist(theta.rand.mean, xlim=c(-.2,.2), nclass=50, col=3, main="Histogram of Randomly Generated Data for Means")
abline(v=c(ci.u,ci.l))

Image may be NSFW.
Clik here to view.
Histogram for Bootstrap Median

Image may be NSFW.
Clik here to view.
bootstrap_mean

Image may be NSFW.
Clik here to view.
bootstrap_mean_CI

 

 

 

To leave a comment for the author, please follow the link and comment on his blog: Statistical Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Viewing all articles
Browse latest Browse all 54

Trending Articles