Quantcast
Channel: R-bloggers » Wesley
Viewing all articles
Browse latest Browse all 54

True Significance of a T Statistic

$
0
0

(This article was first published on Statistical Research » R, and kindly contributed to R-bloggers)

The example is more of a statistical exercise that  shows the true significance and the density curve of simulated random normal data.  The code can be changed to generate data using either a different mean and standard deviation or a different distribution altogether.

This extends the idea of estimating pi by generating random normal data to determine the actual significance level. First, there is a simple function to calculate a pooled t statistic. Then repeat that function N times. The second part of the R code will produce a density graph of the t statistic (normalized to 1). By changing the mean and standard deviation one can see what happens to the true significance level and the density graph.  Just keep that in mind the next someone assumes a standard normal distribution for hypothesis testing and they are set on alpha = 0.05


###
#Function to calculate the t statistic
###
tstatfunc=function(x,y){
m=length(x)
n=length(y)
spooled=sqrt(((m-1)*sd(x)^2+(n-1)*sd(y)^2)/(m+n-2))
tstat=(mean(x)-mean(y))/(spooled*sqrt(1/m+1/n))
return(tstat)
}

###
#Set some constants
###
alpha=.05
m=15
n=15
N=20000
n.reject=0

###
#Iterate N times
###
for (i in 1:N){
x=rnorm(m,mean=0,sd=1)
y=rnorm(n,mean=0,sd=1)
t.stat=tstatfunc(x,y)
if (abs(t.stat)>qt(1-alpha/2,n+m-2)){
n.reject=n.reject+1
}
}
true.sig.level=n.reject/N
true.sig.level

###
#Function to simulate t-statistic vector for graphing
###
tsim=function(){
tstatfunc(rnorm(m,mean=0,sd=1), rnorm(n,mean=0,sd=1))
}

###
#Set up the values to graph
###
tstat.vec=replicate(10000, tsim())
dens.y = density(tstat.vec)$y/max(density(tstat.vec)$y)
dens.x = density(tstat.vec)$x

###
#Graph the density for each
###
plot(
NULL, NULL, xlim=c(-5,5),ylim=c(0,1), main="Simulated and True Density"
)
lines(dens.x,dens.y, lty=1, col=3, lwd=3)

curve(
dt(x,df=n+m-2)/max(dt(x,df=n+m-2)),add=TRUE
)

To leave a comment for the author, please follow the link and comment on his blog: Statistical Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Viewing all articles
Browse latest Browse all 54

Trending Articles