Inference in the balanced one-way random effect ANOVA

stlaPblog

a blog about Mathematics, R, Statistics, ...

Home
About

Some links

▶ R links

PoirotReproducible Blogging with R Markdown
SlidifyReproducible html5 slides from R markdown
R-bloggersBlog posts about R, contributed by R bloggers worldwide.

▶ Blogs

stla.overblogMy previous blog
Timely Portfolio A great blog about R, Javascript, and more

Inference in the balanced one-way random effect ANOVA
2015-03-12
Source
The one-way random effect ANOVA model is the simplest linear Gaussian model with random effects: one random factor, and no other factor or covariate. However, methods of inference to get estimates, confidence intervals for various parameters (variance components, coefficient of variation), prediction intervals and tolerance intervals, are not so well-known. This is what we provide in this article, in the simplest situation : the case of balanced data (at the end we give a possible way to handle the unbalanced situation).

Recall that, denoting by \(y_{ij}\) the \(j\)-th response in group \(i\). this model assumes that the responses \(y_{ij}\) are generated in two steps. First, the group means \(\mu_i\) are independently generated according to a Gaussian distribution \({\cal N}(\mu, \sigma^2_b)\) where \(\mu\) is the overall mean and \(\sigma^2_b\) is the so-called between variance. Second, the responses \(y_{ij}\), \(j =1,\ldots,J\), for each group \(i\), are independently distributed according to a Gaussian distribution \({\cal N}(\mu_i, \sigma^2_w)\) with within variance \(\sigma^2_w\) and mean \(\mu_i\). Shortly, the model can be written: \[\begin{cases} \mu_i \sim_{\text{iid}} {\cal N}(\mu, \sigma^2_b) & i=1,\ldots,I \\ (y_{ij} \mid \mu_i) \sim_{\text{iid}} {\cal N}(\mu_i, \sigma^2_w) & j=1,\ldots,J \end{cases}.\]

If you like the moment of inertia representation of the variance on this picture (I do), see my article about it.

Throughout the article, we will use the following function to get the within sum-of-squares \(SS_w\) and the between sum-of-squares \(SS_b\):
```
SOS <- function(y, group){
  require(dplyr)
  return(data.frame(y=y,group=group) %>% mutate(mean=mean(y)) %>% group_by(group) %>% mutate(groupmean=mean(y)) %>% ungroup %>% do(summarise(., SSw=crossprod(y-groupmean), SSb=crossprod(groupmean-mean))) %>% as.list)
  }
```
Bayesian posterior distributions

All Bayesian methods we provide are based on the posterior distributions generated by this code. I???m using a data.table for storing the stimulations.
```
ranova <- function(y, group, nsims=50000){
  require(data.table)
  group <- factor(group)
  I <- length(levels(group))
  if(!all(I*as.numeric(table(group))==length(y))){ stop("balanced only!") }
  J <- length(y)/I
  SS <- SOS(y, group) #  sum of squares SSb SSw
  SSb <- SS$SSb; SSw <- SS$SSw
  omean <- mean(y)
  k <- tot <- 0
  z <- rnorm(nsims)
  ETAb <- ETAw <- rep(NA,nsims)
  while(k < nsims){
    tot <- tot +1
    etab <- SSb/rchisq(1, I-1)  
    etaw <- SSw/rchisq(1, I*(J-1))
    if(etab>etaw){
      k <- k+1
      ETAb[k] <- etab
      ETAw[k] <- etaw
      }
    }
  out <- data.table(mu=omean + sqrt(ETAb/I/J)*z, sigma2w=ETAw, sigma2b=(ETAb-ETAw)/J)
  reject <- 100*(tot-nsims)/tot
  cat("Rejection percentage: ", round(reject,1), "%")
  setattr(out, "reject", reject)
  return(out)
  }
```
These posterior distributions can be found in Krishnamoorthy & Mathew???s book. I believe they are derived in Box & Tiao???s book.

As we can see in the body of the function, the algorithm firstly generates simulations of a couple \((\eta_w, \eta_b)\) whose joint distribution is given two inverse-Gamma distributions \({\cal IG}\left(\frac{I(J-1)}{2},\frac{SS_w}{2}\right)\) and \({\cal IG}\left(\frac{I-1}{2},\frac{SS_b}{2}\right)\) conditionned to \(\eta_b > \eta_w\), and then \(\sigma^2_w=\eta_w\) \(\sigma^2_b=\frac{\eta_b-\eta_w}{J}\). The ranova function also prints the percentage of rejected simulations (those for which etab<etaw) and also returns it as an attribute of the data.table containing the simulations.

Note that \(\eta_b = J\tau^2\) where \(\tau^2=\sigma_b^2+\frac{\sigma_w^2}{J}\) is the variance of a group mean \(\bar y_{i\bullet}\). Given the sum of squares \(SS_w\) and \(SS_b\), the probability of rejection is, using abusive notation, \[\begin{align} R(SS_w,SS_b) & = \Pr\left(\frac{\chi^2_{I(J-1)}}{\chi^2_{I-1}} < \frac{SS_w}{SS_b}\right) \\ & = \Pr\left({\cal Beta}'\left(\frac{I(J-1)}{2}, \frac{I-1}{2}\right) < \frac{SS_w}{ SS_b} \right) \\ & = \Pr\left({\cal Beta}\left(\frac{I(J-1)}{2}, \frac{I-1}{2}\right) < \frac{SS_w}{SS_w + SS_b} \right). \end{align}\]

Moreover, as is well known or as shown here, the sum of squares \(SS_w\) and \(SS_b\) are independent and \(SS_w \sim \sigma^2_w\chi^2_{I(J-1)}\) and \(SS_b \sim J\tau^2\chi^2_{I-1}\).

Therefore the rejection probability over repeated sampling is \[ R = \Pr\left(\frac{{\cal Beta}'\left(\frac{I(J-1)}{2}, \frac{I-1}{2}\right)}{{\cal Beta}'\left(\frac{I(J-1)}{2}, \frac{I-1}{2}\right)} < \frac{\sigma^2_w}{J\tau^2} \right). \] Let \(\rho:=\dfrac{\sigma^2_w}{\sigma^2_w+\sigma^2_b}\) be the ratio of the within-variance over the total variance, and note that \(\dfrac{\sigma^2_w}{J\tau^2} = \dfrac{\rho}{\rho+J(1-\rho)}\) (an increasing function of \(\rho\)). There is no elementary cumulative distribution fonction for the ratio of the two independent Beta prime distributions, but we can easily plot \(R\) in function of \(\rho\) with the help of simulations. We use Javascript in order to include interactivity for the choice of \(I\) and \(J\), and the Javascript flot library to render the plot:

As an exercise, you can check why the probability of rejection always is \(50\%\) for \(\rho=1\).
Simulated data

Throughout the article, we will illustrate the methods on two datasets, dat1 and dat2. I load the magrittr package to use the pipe operator %>%.
```
library(magrittr)
set.seed(314)
I <- 3
J <- 6
mu <- 10
sigmab <- 2
sigmaw <- 1
dat1 <- data.frame(sapply(rnorm(I,mu,sigmab), function(mu) rnorm(J,mu,sigmaw))) %>%
  setNames(LETTERS[1:I]) %>% stack %>% setNames(c("y", "group"))
sigmab <- 1
dat2 <- data.frame(sapply(rnorm(I,mu,sigmab), function(mu) rnorm(J,mu,sigmaw))) %>%
  setNames(LETTERS[1:I]) %>% stack %>% setNames(c("y", "group"))
# stacked datasets
dat <- rbind(cbind(data="dat1", dat1), cbind(data="dat2", dat2))
```
The between-variance \(\sigma_b^2\) is the only difference between the parameters used to generate our two datasets (\(4\) vs \(1\)). We can already store the posterior simulations of the Bayesian method:
```
sims1 <- with(dat1, ranova(y, group, nsims=1e6))
## Rejection percentage:  0 %
sims2 <- with(dat2, ranova(y, group, nsims=1e6))
## Rejection percentage:  19.2 %
# stacked simulations in a data.table
sims <- rbind(sims1[, "data":="data1"], sims2[,"data":="data2"])
setkey(sims, data)
```
Note that the theoretical rejection percentages could be calculated with our above formula of \(R(SS_w,SS_b)\):
```
by(dat, dat$data, FUN=function(data){
  with(data, 
       with(SOS(y,group), 
            round(100*pbeta(SSw/(SSw+SSb), I*(J-1)/2, (I-1)/2), 1)
            )
       )
  })
## dat$data: dat1
## [1] 0
## -------------------------------------------------------- 
## dat$data: dat2
## [1] 19.1
```
Because there is no rejection for the first dataset, the posterior distribution of \(\eta_b\) is \({\cal IG}\left(\frac{I-1}{2},\frac{SS_b}{2}\right)\), as confirmed by the density plot below (see this stats.stackexchange discussion if you wonder why I use the logspline package):
```
with(dat1, 
     {
       ssb <- J*crossprod(aggregate(y~group, FUN=mean)$y-mean(y))
       curve(dgamma(x,(I-1)/2, ssb/2), to=0.4, lwd=2, xlab=NA, ylab=NA, axes=FALSE)
      })
axis(1)
library(logspline)
x <- 1/with(sims1[sample(nrow(sims1),1e4)], J*sigma2b+sigma2w)
plot(logspline(x), add = TRUE, col = "red", lwd = 2)
```
But not for the second dataset:
```
with(dat2, 
     {
       ssb <- J*crossprod(aggregate(y~group, FUN=mean)$y-mean(y))
       curve(dgamma(x,(I-1)/2, ssb/2), to=1, ylim=c(0,2.6), lwd=2, xlab=NA, ylab=NA, axes=FALSE)
      })
axis(1)
lines(density(1/with(sims2, J*sigma2b+sigma2w)), col="red", lwd=2)
```
Confidence interval about the overall mean
Frequentist method

There???s a straightforward way to get a confidence interval about the overall mean \(\mu\): just take the group means and draw a usual Gaussian confidence interval about their mean:
```
by(dat, dat$data, function(data){
  aggregate(y~group, data=data, FUN=mean) %>% lm(y~1, .) %>% confint
  })
## dat$data: dat1
##                2.5 %   97.5 %
## (Intercept) 4.304683 13.03132
## -------------------------------------------------------- 
## dat$data: dat2
##                2.5 %   97.5 %
## (Intercept) 8.419673 11.37769
```
Bayesian method

Throughout the article we always use equi-tailed credible intervals.
```
quantile(sims["data1"]$mu, c(2.5,97.5)/100)
##     2.5%    97.5% 
##  4.29467 13.03949
quantile(sims["data2"]$mu, c(2.5,97.5)/100)
##      2.5%     97.5% 
##  8.242078 11.563506
```
For the first dataset, the Bayesian interval is close to the frequentist interval. We can see why: when \(R(SS_w,SS_b)=0\), the posterior distribution of \(\eta_b\) is \(\eta_b \sim {\cal IG}\left(\frac{I-1}{2},\frac{SS_b}{2}\right)\) and then the posterior distribution of \(\mu\) is a non-standardized Student distribution with \(I-1\) degrees of freedom, location parameter \(\bar y_{\bullet\bullet}\), and scale \(\sqrt{\frac{SS_b}{IJ(I-1)}}\). Therefore the Bayesian interval exactly coincides with the frequentist interval when \(R(SS_w,SS_b)=0\), and this is the case for the first dataset. Otherwise, when \(R(SS_w,SS_b)=0\), the Bayesian interval is wider.

Yes, probabilists, I know \(R(SS_w,SS_b)\) is never exactly \(0\) !
Confidence interval about variance components
Frequentist method

The \(\chi^2\) distribution of \(SS_w\) easily allows to derive a confidence interval about the within-variance \(\sigma^2_w\). By the way it is the same as the case of the fixed effects model.

Things are more complicated for the between-variance \(\sigma^2_b\) and the total variance \(\sigma^2_w+\sigma^2_b\). The function confintVC below provides confidence intervals following the method given by Burdick & al, based on Graybill & Wang???s confidence intervals for linear combinations of \(\chi^2\) distributions.
```
confintVC <- function(y, group, alpha=5/100){
  group <- factor(group)
  I <- length(levels(group))
  if(!all(I*as.numeric(table(group))==length(y))){ stop("balanced only!") }
  J <- length(y)/I
  #
  DFb <- I-1 # between df
  DFw <- I*(J-1) # within df
  SS <-  SOS(y, group) # sums of squares
  SSb <- SS$SSb; SSw <- SS$SSw
  MSSb <- SSb/DFb; MSSw <- SSw/DFw # mean sum of squares
  # Confidence Intervals 
  a <- alpha/2
  ## Within variance confidence interval  
  sigma2w <- MSSw # estimate
  withinLCB <- sigma2w/qf(1-a,DFw,Inf)  # Within lwr 
  withinUCB <- sigma2w/qf(a,DFw,Inf) # Within upr
  ## Between variance confidence interval
  sigma2b <- (MSSb-MSSw)/J # estimate
  G1 <- 1-(1/qf(1-a,DFb,Inf)) 
  G2 <- 1-(1/qf(1-a,DFw,Inf)) 
  H1 <- (1/qf(a,DFb,Inf))-1  
  H2 <- (1/qf(a,DFw,Inf))-1  
  G12 <- ((qf(1-a,DFb,DFw)-1)^2-(G1^2*qf(1-a,DFb,DFw)^2)-(H2^2))/qf(1-a,DFb,DFw) 
  H12 <- ((1-qf(a,DFb,DFw))^2-H1^2*qf(a,DFb,DFw)^2-G2^2)/qf(a,DFb,DFw) 
  Vu <- H1^2*MSSb^2+G2^2*MSSw^2+H12*MSSb*MSSw  
  Vl <- G1^2*MSSb^2+H2^2*MSSw^2+G12*MSSw*MSSb  
  betweenLCB <- (MSSb-MSSw-sqrt(Vl))/J # Betwen lwr
  betweenUCB <- (MSSb-MSSw+sqrt(Vu))/J  # Between upr  
  ## Total variance confidence interval
  sigma2tot <- sigma2w+sigma2b   # estimate
  totalLCB <- sigma2tot - (sqrt(G1^2*MSSb^2+G2^2*(J-1)^2*MSSw^2)/J) # Total lwr 
  totalUCB <- sigma2tot + (sqrt(H1^2*MSSb^2+H2^2*(J-1)^2*MSSw^2)/J)  # Total upr  
  # Output
  out <- data.frame(
    estimate=c(sigma2w, sigma2b, sigma2tot),
    lwr=c(withinLCB, betweenLCB, totalLCB),
    upr=c(withinUCB, betweenUCB, totalUCB))
  rownames(out) <- c("within", "between", "total")
return(out)
}
```
```
with(dat1, confintVC(y, group)) 
##          estimate       lwr        upr
## within  0.9678172 0.5281232   2.318259
## between 2.9238944 0.6600113 121.680179
## total   3.8917116 1.6132104 122.670629
with(dat2, confintVC(y, group))  
##          estimate        lwr       upr
## within  1.1487720  0.6268675  2.751709
## between 0.1630157 -0.2117912 13.789360
## total   1.3117877  0.8059042 15.023651
```
The same estimates are provided by lmer(y~group) with the lme4 package for mixed-models.
Bayesian method

First, let us construct the simulations of the total variance:
```
sims[, "sigma2tot":=sigma2w+sigma2b]
```
Let???s see the estimates before looking at the credible intervals:
```
sapply(sims["data1", c("sigma2w", "sigma2b", "sigma2tot"), with=FALSE], function(x) c(mean=mean(x), median=median(x)))
##         sigma2w   sigma2b sigma2tot
## mean   1.116632 40.565856 41.682488
## median 1.012645  4.268724  5.408561
```
```
sapply(sims["data2", c("sigma2w", "sigma2b", "sigma2tot"), with=FALSE], function(x) c(mean=mean(x), median=median(x)))
##         sigma2w   sigma2b sigma2tot
## mean   1.248280 6.1369897  7.385270
## median 1.144814 0.4706216  1.794074
```
Clearly, the median is a better estimate than the mean. Now let???s have a look at the credible intervals.

For the first dataset, the confidence interval about the within-variance is the same as the frequentist confidence interval, as expected, and it is interesting to see that the credible interval is close to the frequentist confidence interval for the other variances:
```
sapply(sims["data1", c("sigma2w", "sigma2b", "sigma2tot"), with=FALSE], function(x) quantile(x, c(2.5,97.5)/100)) 
##         sigma2w     sigma2b  sigma2tot
## 2.5%  0.5282047   0.6471158   1.636684
## 97.5% 2.3164339 121.5854008 122.691925
```
For the second dataset, it is not expected to get results close to the frequentist results. We get a shorter interval for the within-variance, and wider intervals for the other variances:
```
sapply(sims["data2", c("sigma2w", "sigma2b", "sigma2tot"), with=FALSE], function(x) quantile(x, c(2.5,97.5)/100))
##         sigma2w     sigma2b sigma2tot
## 2.5%  0.6109852  0.01482579  0.810674
## 97.5% 2.4954109 17.13539103 18.486391
```
Confidence interval about coefficient of total variation
Frequentist method

A confidence interval about the coefficient of total variation \(\dfrac{\sqrt{\sigma^2_w+\sigma^2_b}}{\mu}\) is provided by the function ranovaCV below. The method is based on the generalized \(p\)-value approach using a generalized pivotal quantity. The procedure performed by this function is an original one: I did it myself, but I don???t really know how: I have naively followed a procedure given in Krishnamoorthy & Mathew???s book for another parameter of interest.
```
ranovaCV <-  function(y, group, level=0.95, nsims=100000){
  group <- factor(group)
  I <- length(levels(group))
  if(!all(I*as.numeric(table(group))==length(y))){ stop("balanced only!") }
  J <- length(y)/I
  SS <-  SOS(y, group) # sums of squares
  SSb <- SS$SSb; SSw <- SS$SSw
  Z <- rnorm(nsims)
  Ub <- rchisq(nsims, I-1)
  Ue <- rchisq(nsims, I*(J-1)) # ! J*(I-1) in K & M's book is an error
  mu.gpq <- mean(y)-Z/sqrt(Ub)*sqrt(SSb/I/J)
  sigmatot.gpq <- 1/sqrt(J)*sqrt(SSb/Ub+(J-1)*SSw/Ue)
  G <- sigmatot.gpq/mu.gpq
  low <- as.numeric(quantile(G,(1-level)/2)) # borne inf
  up <- as.numeric(quantile(G,level+(1-level)/2)) # borne sup
return(list(lwr=low, upr=up))
}
```
```
with(dat1, ranovaCV(y, group)) # true value : sqrt(5)/10 ??? 0.224
## $lwr
## [1] 0.1403256
## 
## $upr
## [1] 1.488213
with(dat2, ranovaCV(y, group)) # true value : sqrt(2)/10 ??? 0.1414
## $lwr
## [1] 0.08956391
## 
## $upr
## [1] 0.3931898
```
Bayesian method

Similarly to any other parameter of interest, one firstly constructs the simulations of the coefficient of total variation:
```
sims[, "CV":=sqrt(sigma2tot)/mu]
```
and then one takes the quantiles:
```
sims[, list(lwr=quantile(CV, 2.5/100), upr=quantile(CV, 97.5/100)), by="data"]
##     data        lwr       upr
## 1: data1 0.14091359 1.4463289
## 2: data2 0.09033098 0.4414735
```
Prediction interval
Frequentist method

This method is given in Lin & Liao. Denote by \(y^{\text{new}}\) a new observation. Note that \[ y^{\text{new}} - \bar{y}_{\bullet\bullet} \sim {\cal N}\left(0, \sigma^2_{\text{tot}} + \frac{\tau^2}{I}\right),\] and the variance of \((y^{\text{new}} - \bar{y}_{\bullet\bullet})\) can be written
\[ \left(1+\frac{1}{I}\right)\tau^2 + \left(1-\frac{1}{J}\right)\sigma^2_w. \] Then knowing that \(SS_w \sim \sigma^2_w\chi^2_{I(J-1)}\) and \(SS_b \sim J\tau^2\chi^2_{I-1}\), an unbiased estimate of the variance of \((y^{\text{new}} - \bar{y}_{\bullet\bullet})\) is \[ \left(1+\frac{1}{I}\right)\frac{SS_b}{J(I-1)} + \left(1-\frac{1}{J}\right)\frac{SS_w}{I(J-1)} = \frac{I+1}{I(I-1)}\frac{SS_b}{J} + \frac{SS_w}{IJ} \] This estimate does not follow a Chi-square distribution but we assimilate it as an approximate Chi-square distribution using Satterthwaite estimate. In addition, since we know (or as shown here) that \(SS_b\), \(SS_w\) and \((y^{\text{new}} - \bar{y}_{\bullet\bullet})\) are independent, \[ \dfrac{y^{\text{new}} - \bar{y}_{\bullet\bullet}}{\sqrt{\dfrac{I+1}{I(I-1)}\dfrac{SS_b}{J} + \dfrac{SS_w}{IJ}}} \approx \mathrm{t}_\nu, \] wherefrom we get a prediction interval about \(y^{\text{new}}\).
```
ranovapred <- function(y, group, conf=0.95){
  group <- factor(group)
  I <- length(levels(group))
  if(!all(I*as.numeric(table(group))==length(y))){ stop("balanced only!") }
  J <- length(y)/I
  SS <-  SOS(y, group) # sums of squares
  SSb <- SS$SSb; SSw <- SS$SSw
    a <- (1/J*(1+1/I))/(I-1)
    b <- 1/I/J 
    v <- a*SSb+b*SSw # estimates the variance of (Ynew-Ybar)
    nu <- v^2/((a*SSb)^2/(I-1)+(b*SSw)^2/I/(J-1)) # Satterthwaite degrees of freedom
    alpha <- 1-conf
    bounds <- mean(y) + c(-1,1)*sqrt(v)*qt(1-alpha/2,nu)
return(list(bounds=bounds, std.error=sqrt(v)))
}
```
```
by(dat, dat$data, function(data){
  with(data, ranovapred(y, group)$bounds)
  })
## dat$data: dat1
## [1]  1.388652 15.947355
## -------------------------------------------------------- 
## dat$data: dat2
## [1]  7.289183 12.508177
```
Bayesian method

The Bayesian prediction interval is obtained by taking the quantiles of the posterior predictive distribution. One firstly simulates this distribution:
```
sims[, ynew:=rnorm(nrow(sims), mu, sqrt(sigma2tot))]
```
One takes the quantiles:
```
sims[, list(lwr=quantile(ynew, 2.5/100), upr=quantile(ynew, 97.5/100)), by="data"]
##     data        lwr      upr
## 1: data1 -0.2360107 17.55792
## 2: data2  6.0875652 13.72911
```
For both datasets, the Bayesian prediction interval is larger than the frequentist one.
One-sided tolerance interval

We provide methods to get lower and upper tolerance limits. Recall this is nothing but a lower/upper confidence bound about a lower/upper quantile, and we will take the lower/upper \(90\%\) quantile as an example. The theoretical values are:
```
# first dataset
qnorm(c(10,90)/100, 10, sqrt(5))
## [1]  7.134364 12.865636
# second dataset
qnorm(c(10,90)/100, 10, sqrt(2))
## [1]  8.187612 11.812388
```
Frequentist method

The ranovatol function provides Krishnamoorthy & Mathew???s tolerance limits given in Krishnamoorthy & Mathew???s book.
```
ranovatol <- function(y, group, p=0.9, conf=0.95, nsims=50000){
  group <- factor(group)
  I <- length(levels(group))
  if(!all(I*as.numeric(table(group))==length(y))){ stop("balanced only!") }
  J <- length(y)/I
  SS <-  SOS(y, group) # sums of squares
  SSb <- SS$SSb; SSw <- SS$SSw
    Z <- rnorm(nsims)
    Ub <- rchisq(nsims, I-1)
    Ue <- rchisq(nsims, I*(J-1)) # ! J*(I-1) in K & M's book is an error
    G <- mean(y)-Z/sqrt(Ub)*sqrt(SSb/I/J)-qnorm(p)/sqrt(J)*sqrt(SSb/Ub+(J-1)*SSw/Ue)
    low <- as.numeric(quantile(G,1-conf)) 
    G <- mean(y) + Z/sqrt(Ub)*sqrt(SSb/I/J) + qnorm(p)/sqrt(J)*sqrt(SSb/Ub+(J-1)*SSw/Ue)
    up <- as.numeric(quantile(G,conf)) 
return(c(lower=low, upper=up))
}
```
```
by(dat, dat$data, function(data){
  with(data, ranovatol(y, group))
  })
## dat$data: dat1
##     lower     upper 
## -2.355055 19.691062 
## -------------------------------------------------------- 
## dat$data: dat2
##     lower     upper 
##  5.870706 13.926654
```
Bayesian method

We proceed as for any parameter of interest. First:
```
sims[, c("tol_lower","tol_upper"):=list(qnorm(.1, mu, sqrt(sigma2tot)), qnorm(.9, mu, sqrt(sigma2tot)))]
```
then:
```
sims[, list(lower=quantile(tol_lower, 5/100), upper=quantile(tol_upper, 95/100)), by="data"]
##     data     lower    upper
## 1: data1 -2.242143 19.58247
## 2: data2  5.534809 14.28729
```
What about the unbalanced situation ?

I have never tried, but replacing everywhere the sum of squares \(SS_w\) and \(SS_b\) by the unweighted sum of squares should give good results in the unbalanced case. This is known to work well for variance components (see Burdick & al).

« Back Home