Loading Web-Font TeX/Math/Italic
  1. Equivalence testing based on exceedance probabilities

    2015-01-06
    Source

    Equivalence testing is usually done by assessing the deviation between two Gaussian means \mu_1 and \mu_2.
    An acceptable deviation \Delta between \mu_1 and \mu_2 has to be set by the user, and then equivalence is accepted if there is a statistical significant evidence that the deviation between \mu_1 and \mu_2 does not exceed \Delta.

    plot of chunk plot0

    In some cases issued from fields of research such as psychology or agronomy, setting the acceptable deviation \Delta should cause no problem for the expert: the psychologist has its own advice about which difference between the score means of two series of psychological tests is considered as expressive, and similarly the agronomist has its own advice about which difference between the weight means of two populations of fruits is considered as expressive,

    For industrial processes, the mean is not always the more relevant characteristic. For instance, an upper specification limit (USL) is given and the relevant characteristic of the process is the out-of-specification probability (OOS), that is, the probability that the process generates some data beyond USL.

    plot of chunk plot1

    Some guidelines recommend to set the acceptable deviation \Delta between the means by thinking in terms of specification limits. This requires to take into account the variability of the two processes under study. Once USL is given, it is very more natural to set an acceptable deviation between the exceendance probabilities OOS_1 and OOS_2 rather than between the means \mu_1 and \mu_2. We achieve the development of this approach in this article, by providing a confidence interval about the difference OOS_1-OOS_2 between the two exceedance probabilities. We give a Bayesian method in the case of independant samples with equal variances, which can be similarly applied to the case of unequal variances and to the case of paired samples. Our methods are easily modified to handle the case of a lower specification limit as well as the case of a specification two-sided interval.

    Jeffreys posterior distribution for the ANOVA model

    We assume two independent Gaussian sampling distributions sharing the same variance, that is, a one-way ANOVA model with two groups. We will perform statistical inference based on a noninformative Bayesian approach. The banova function below generates the Jeffreys posterior distribution of the parameters.

    1. > banova <- function(y, group, nsims=100000){
    2. + group <- factor(group)
    3. + means <- aggregate(y~group, FUN=mean)$y # groups means
    4. + sizes <- table(group) # groups sizes
    5. + ssq <- crossprod(y-rep(means, times=sizes)) # residual sum of squares
    6. + sigma <- sqrt(ssq/rchisq(nsims,df=length(y)))
    7. + mu <- lapply(structure(1:nlevels(group), names=levels(group)),
    8. + function(i) rnorm(nsims, means[i], sigma/sqrt(sizes[i])))
    9. + return(data.frame(mu, sigma=sigma))
    10. + }

    The model has three parameters \mu_1, \mu_2 and \sigma and their posterior simulations are stored in a data frame. Let us see an example with a simulated balanced dataset with 20 observations in each group:

    1. > I <- 2 # number of groups
    2. > J <- 20 # number of replicates per group
    3. > mu <- c(2,2.5) # group means
    4. > sigma <- 1 # standard deviation within each group
    5. > group <- gl(I, J, labels=sprintf("group%s", 1:I)) # groups
    6. > set.seed(666)
    7. > y <- c(sapply(mu, function(mu) rnorm(J, mu, sigma))) # response

    The theoretical model is given by this picture:

    plot of chunk plot3

    The banova function takes a short time to generate 10^6 samples of the posterior distribution:

    1. > system.time(sims <- banova(y, group, nsims = 1e+06))
    2. user system elapsed
    3. 0.602 0.025 0.626
    4. > head(sims,3)
    group1 group2 sigma
    1.905582 2.146182 1.099777
    1.836148 2.097849 1.259851
    1.773983 2.618615 1.103276

    Frequentist matching property

    An appealing property of the Jeffreys posterior distribution is the so-called frequentist matching property. That means that the posterior 100(1-\alpha)\%-credibility intervals of a parameter of interest approximately have a 100(1-\alpha)\% frequentist coverage. This is not true in general for any model, and this is not true for any parameter of interest, but say it is expected to be true in general.

    Here for example, this is true for the group means \mu_1 and \mu_2. They have a tractable posterior distribution, each of them follows a scaled Student distribution centered around the least-squares estimates. Thus, the posterior means coincide with the least-square estimates:

    1. > names(sims)[1:2] <- c("µ1", "µ2")
    2. > sapply(sims, mean)
    3. µ1 µ2 sigma
    4. 2.001408 2.246180 1.197806
    5. > fit <- lm(y~0+group)
    6. > coef(fit)
    7. groupgroup1 groupgroup2
    8. 2.001438 2.246584

    Equi-tailed posterior credibility intervals are obtained by taking the quantiles of the posterior distributions:

    1. > sapply(sims, function(x) quantile(x, c(2.5, 97.5)/100))
    µ1 µ2 sigma
    2.5% 1.470561 1.715085 0.9649394
    97.5% 2.532203 2.776631 1.5032790

    We can get them without using the simulations with the help of the Student quantiles:

    1. > S2 <- crossprod(residuals(fit)) # sum of squares
    2. > freqs <- sapply(unique(group), function(label) length(which(group==label))) # group sizes
    3. > n <- sum(freqs)
    4. > indexes <- setNames(1:I, paste0("µ", 1:I))
    5. > vapply(indexes, function(i) coef(fit)[i] + sqrt(S2/(n*freqs[i]))*qt(c(2.5, 97.5)/100, n), FUN.VALUE=numeric(2))
    µ1 µ2
    1.470352 1.715497
    2.532525 2.777670

    This formula of the interval limits is close to the one giving the classical confidence interval (provided by confint(fit)):

    1. > vapply(indexes, function(i) coef(fit)[i] + sqrt(S2/(freqs[i]*(n-I)))*qt(c(2.5, 97.5)/100, n-I), FUN.VALUE=numeric(2))
    µ1 µ2
    1.455661 1.700806
    2.547216 2.792362

    The frequentist interval is a bit wider, and the difference disappears as n increase.

    Comparing the OOS

    Something very cool with the posterior simulations is the straightforward way to get posterior simulations of any quantity of interest \theta=f(\mu_1, \mu_2, \sigma) by simply applying the function f to the posterior simulations of (\mu_1, \mu_2, \sigma). Here we are interested in the difference between OOS_1 and OOS_2, the probabilities to get an observation above USL in twe two groups, for a given upper specification limit USL. The first one is 1-\Phi\left(\frac{USL-\mu_1}{\sigma}\right) and the second one is 1-\Phi\left(\frac{USL-\mu_2}{\sigma}\right), thus we get the credibility intervals as follows, taking USL=4:

    1. > library(dplyr)
    2. > USL <- 4
    3. > sims1 <- mutate(sims, OOS1 = 100*(1-pnorm(USL, µ1, sigma)), OOS2 = 100*(1-pnorm(USL, µ2, sigma)), Delta=OOS2-OOS1)
    4. > kable(sims1 %>% reshape2::melt() %>% group_by(variable) %>% summarise(estimate=mean(value), lwr=quantile(value, 2.5/100), upr=quantile(value, 97.5/100)), digits=2, caption="Jeffreys estimates and $95\\%$-credibility intervals")
    variable estimate lwr upr
    µ1 2.00 1.47 2.53
    µ2 2.25 1.72 2.78
    sigma 1.20 0.96 1.50
    OOS1 5.24 1.16 13.14
    OOS2 7.65 2.11 17.32
    Delta 2.41 -5.33 11.36

    Note that the intervals around OOS_1, OOS_2 and \Delta are quite large. That shows that the estimates are far to be precise.

    (R point) The pipe operator %>%. As you see, I used the %>% operator in the previous code block. This operator is provided by the magrittr package, which is automatically loaded here by loading the dplyr package. In case you don't know this operator, I will explain a shorter example in the next section. Here we use it in order to avoid to nest too many functions : summarise(group_by(reshape2::melt(sims1, ....

    Looking at the deviation in function of USL

    It is interesting to look at the OOS deviation in function of USL. For our example, here is the theoretical picture:

    plot of chunk dev_vs_usl

    The maximal value of the OOS deviation \Delta is called the Kolmogorov distance between the two distributions. In our case of two Gaussian distributions with equal variances, it is attained for USL=\dfrac{\mu_1+\mu_2}{2\sigma} (see the derivation in the Appendix), and shown by the orange line on the figure.

    Thus, we could derive an estimate and a credibility interval around the Kolmogorov distance, that is to say the maximal deviation between OOS_1 and OOS_2 over all possible values of USL:

    1. > Kdist <- function1, µ2, sigma) ((µ12)/2/sigma) %>% {pnorm(., µ1, sigma) - pnorm(., µ2, sigma)} %>% abs
    2. > Kdist_sims <- with(sims, 100*Kdist1, µ2, sigma))mean(Kdist_sims); quantile(Kdist_sims, probs = c(2.5,97.5)/100)
    3. [1] 11.54142
    4. 2.5% 97.5%
    5. 0.4700835 31.1350492

    But this is not really releveant in the context of OOS. It is clear on our example where \sigma=1 that we are not interested in the deviation \Delta for a value of USL such as USL=\dfrac{\mu_1+\mu_2}{2\sigma}. In practice the value of USL is usually located in the upper tails of the two Gaussian distributions.

    (R point) The pipe operator %>%. As promised, I detail an example of the use of the %>% operator. Without this operator, I would have defined the Kdist function as follows:

    1. > Kdist <- function1, µ2, sigma) abs(pnorm((µ12)/2/sigma, µ1, sigma) - pnorm((µ12)/2/sigma, µ2, sigma))

    or, in order to avoid the double calculation of (µ1+µ2)/2/sigma:

    1. > Kdist <- function1, µ2, sigma){
    2. + x <- 12)/2/sigma
    3. + abs(pnorm(x, µ1, sigma) - pnorm(x, µ2, sigma))
    4. + }

    Here, the pipe operator %>% firstly allows up to map (µ1+µ2)/2/sigma conveniently:

    1. > ((µ12)/2/sigma) %>% {pnorm(., µ1, sigma) - pnorm(., µ2, sigma)}

    Then the result is mapped to the abs function by adding %>% abs (equivalently %>% abs(.)), instead of encapsulating the result by doing abs(...).

    Now we calculate and plot the estimate of \Delta and its credibility interval in function of USL.

    1. > library(tidyr) # to use spread()
    2. > sims <- reshape2::melt(data.frame(id=1:nrow(sims), sims), id.vars=c("id", "sigma"), value.name="µ", variable.name="group")
    3. > levels(sims$group) <- c("group1", "group2")
    4. > head(sims, 3)
    id sigma group µ
    1 1.099777 group1 1.905582
    2 1.259851 group1 1.836148
    3 1.103276 group1 1.773983
    1. > usl <- seq(0,5, by=0.25)
    2. > sims <- cbind(USL=usl,
    3. + data.frame(sims[,c("id","group")], setNames(lapply(usl, function(USL) 100*(1-pnorm(USL, sims$µ, sims$sigma))), paste0("OOS_", usl))) %>%
    4. + reshape2::melt(id.vars=c("group","id")) %>% spread(group, value) %>%
    5. + mutate(Delta=group2-group1) %>%
    6. + group_by(variable) %>%
    7. + summarise(estimate=mean(Delta),
    8. + lwr=quantile(Delta, 2.5/100),
    9. + upr=quantile(Delta, 97.5/100)))[,-2]
    10. > head(sims,3)
    USL estimate lwr upr
    0.00 1.741791 -3.967600 8.705849
    0.25 2.421491 -5.345896 11.412120
    0.50 3.244611 -6.983556 14.542463
    1. > library(ggplot2)
    2. > ggplot(sims, aes(x=USL,y=estimate)) + geom_line() +
    3. + geom_ribbon(aes(ymin=lwr, ymax=upr), alpha=0.1) +
    4. + ylab(expression(Delta)) +
    5. + stat_function(fun=function(USL) 100*(pnorm(USL, mu[1], sigma)-pnorm(USL, mu[2], sigma)), color="red")

    plot of chunk interval_dev_vs_usl

    The theoretical deviation is shown by the red curve, the estimated deviation is shown by the black curve. We still note how large are the credibility intervals, especially around the maximual value of the deviation.

    Appendix: Kolmogorov distance between two Gaussian distributions

    Our purpose is to give an expression for the Kolmogorov distance K(\mu_1,\sigma_1,\mu_2,\sigma_2) between the two Gaussian distributions {\cal N}(\mu_1, \sigma_1^2) and {\cal N}(\mu_2, \sigma_2^2), defined by K(\mu_1,\sigma_1,\mu_2,\sigma_2) = \sup_{t \in \mathbb{R}} \bigl|\Pr(X \leq t) - \Pr(Y \leq t) \bigr|

    with X \sim {\cal N}(\mu_1, \sigma_1^2) and Y \sim {\cal N}(\mu_2, \sigma_2^2).

    Writing \Pr(X \leq t) - \Pr(Y \leq t) = \Pr\left(\frac{X-\mu_1}{\sigma_1} \leq \frac{t-\mu_1}{\sigma_1}\right) - \Pr\left(\frac{Y-\mu_2}{\sigma_2} \leq \frac{t-\mu_2}{\sigma_2}\right),

    the expression we are looking for is obtained from the expression of \sup_{z \in \mathbb{R}} \bigl|\Pr(Z_1 \leq z) - \Pr(Z_2 \leq az+b) \bigr|
    with Z_1, Z_2 \sim {\cal N}(0, 1) and a>0, by setting a=\frac{\sigma_1}{\sigma_2} and b=\frac{\mu_1-\mu_2}{\sigma_2}. Thus K(\mu_1,\sigma_1,\mu_2,\sigma_2) = \sup_{z \in \mathbb{R}} \bigl|\Psi_{a,b}(z) \bigr|
    where \Psi_{a,b} is the function defined by \Psi_{a,b}(z)= \Phi(z) - \Phi(az+b)
    for the values of a and b given above. By symmetry one can also take a=\frac{\sigma_2}{\sigma_1} and b=\frac{\mu_2-\mu_1}{\sigma_1}, and then it suffices to focus on the case b \geq 0.

    By elementary calculus, the derivative of \Psi_{a,b} is given by {\Psi'}_{a,b}(z) =\frac{e^{-\frac{1}{2}z^2}-a e^{-\frac{1}{2} (b+a z)^2}}{\sqrt{2 \pi }}.

    Discarding the trivial case a=1 and b=0, then {\Psi_{1,b}}'(z) has a unique root at z=-\frac{b}{2}, and if a\neq 1 then {\Psi_{a,b}}'(z) has two roots at z_i= \frac{a b + {(-1)}^i \sqrt{b^2 + 2 (a^2-1) \log(a)}}{1-a^2}, \qquad i=1,2,
    which we also write z_i = M + {(-1)}^i \Delta
    where M=\frac{a b}{1-a^2} and \Delta=\frac{\sqrt{b^2 + 2 (a^2-1) \log(a)}}{1-a^2}.

    Now we assume 0 < a < 1 and b \geq 0, hence one has 0 \leq M \leq \frac{b}{1-a} and \Delta>0. Obviously z_2>0 and it is not hard to see that z_1<0.
    Furthermore, note that (az_i+b) -z_i = \frac{b}{1+a} + {(-1)}^{i-1} (1-a)\Delta,

    hence (az_1+b) -z_1 >0, and note that \bigl((az_1+b) -z_1\bigr)\bigl((az_2+b) -z_2\bigr)=\frac{2 (1-a)\log(a)}{1+a} < 0,
    therefore (az_2+b) - z_2 < 0. Finally, in the case when 0 < a < 1 and b \geq 0, one has z_1 < 0 \leq M \leq \frac{b}{1-a} < z_2 and we will now consider the two situations shown by these figures:

    plot of chunk kplot1 plot of chunk kplot2

    We will show that the supremum of \bigl|\Psi_{a,b}\bigr| is attained at z_1 in both cases. Note that \bigl|\Psi_{a,b}(z_1)\bigr|=\Psi_{a,b}(z_1) is the standard Gaussian measure of the interval I_1=[z_1, az_1+b] and \bigl|\Psi_{a,b}(z_2)\bigr|=-\Psi_{a,b}(z_2) is the standard Gaussian measure of the interval I_2=[az_2+b, z_2]. The length of I_1 is always larger than the one of I_2. If az_1+b \geq 0 (case 1), then I_1 contains 0, and then it is clear that the standard Gaussian measure of I_1 is larger than the one of I_2 in this case. If az_1+b < 0 (case 2), then one has |az_1+b| < az_2+b because of the equality (az_1+b)+(az_2+b)=\frac{b}{a^2-1} \geq 0,

    which is easy to get by elementary calculus. Therefore the standard Gaussian measure of I_1 is larger than the one of I_2 also in case 2.

    By doing similar considerations, the case a > 1 and b \geq 0 is summarized by the two figures below, and we show by the same approach that the supremum of \bigl|\Psi_{a,b}\bigr| is attained at z_1.

    plot of chunk kplot_ag1 plot of chunk kplot_ag2

    The R function Kdist below returns the Kolmogorov distance:

    1. Kdist00 <- function(a,b){
    2. z <- (a * b - (sign(b)+(b==0)) * sqrt(b^2 + 2 * (a^2 - 1) * log(a)))/(1 - a^2)
    3. out <- pnorm(a*z+b)-pnorm(z)
    4. attr(out, "where") <- z
    5. return(out)
    6. }
    7. Kdist0 <- function(mu1,sigma1,mu2,sigma2){
    8. b <- (mu1-mu2)/sigma2
    9. a <- sigma1/sigma2
    10. if(b>=0){
    11. out <- Kdist00(a,b)
    12. attr(out, "where") <- mu1 + sigma1*attr(out, "where")
    13. return(out)
    14. }else{
    15. return(Kdist0(mu2,sigma2,mu1,sigma1))
    16. }
    17. }
    18. Kdist <- function(mu1,sigma1,mu2,sigma2){
    19. if(sigma1==sigma2){
    20. where <- -(mu1-mu2)/sigma2/2
    21. out <- abs(pnorm(where)-pnorm(-where))
    22. attr(out, "where") <- where
    23. return(out)
    24. }
    25. return(Kdist0(mu1,sigma1,mu2,sigma2))
    26. }