Confidence intervals for comparisons of independent binomial or Poisson rates
SCAS and other asymptotic score methods
For comparison of two groups when the outcome is a binomial proportion, you may wish to quantify the effect size using the difference (, where ) or the ratio of the two proportions (). Another option is the odds ratio (), which is less easy to interpret, but is widely used due to its useful properties for logistic regression modelling or case-control studies.
To calculate a confidence interval (CI) for any of these contrasts, the skewness-corrected asymptotic score (SCAS) method is recommended, as one that succeeds, on average, at containing the true parameter with the appropriate nominal probability (e.g. 95%), and has evenly distributed tail probabilities (Laud 2017). It is a modified version of the Miettinen-Nurminen (MN) asymptotic score method (Miettinen and Nurminen 1985). MN and SCAS each represent a family of methods encompassing analysis of RD and RR for binomial proportions or Poisson rates (e.g. exposure-adjusted incidence rate), and binomial OR.
The plots below illustrate the one-sided and two-sided interval coverage probabilities1 achieved by SCAS compared to some other popular methods2 with , with the second row of each plot using moving average smoothing. A selection of coverage probability plots for other sample size combinations and other contrasts can be found in the “plots” folder of the ratesci GitHub repository.
Differences in two-sided coverage can be more subtle:
The skewness correction, introduced by (Gart and Nam 1988) should be considered essential for analysis of relative risk, but can also have a substantial effect for RD as seen above. The one-sided coverage of MN for RR can be poor even in large samples with equal-sized groups, which means that the confidence interval tends to be located too close to 1:
[Note: (Morten W. Fagerland, Lydersen, and Laake 2011) dismissed the MN method, saying that it can be somewhat liberal. In fact, based on reconstruction of their results, it appears that their evaluation was of the closely-related but inferior Mee method, which omits the N/(N-1) variance bias correction. This error was corrected in their 2017 book. Essentially, it seems they reject the MN method on the basis of some very small regions of the parameter space where coverage is slightly below nominal. These dips in coverage do occur, but fluctuate above and below the nominal level in a pattern of ridges and furrows aligned along the diagonal of a fixed value of . The most severe dips tend to occur when sample sizes are multiples of 10 (e.g. ), which might be relatively rare in practice. The moving average smoothing in the above surface plots is designed to even out these fluctuations in coverage. The ‘N-1’ adjustment in the MN method elevates coverage probabilities slightly, and the skewness correction gives a further improvement overall, as well as substantially improving one-sided coverage when group sizes are unequal.]
scoreci()
is used as follows, for example to obtain a
SCAS 95% CI for the difference 5/56 - 0/29
:
out <- scoreci(x1 = 5, n1 = 56, x2 = 0, n = 29)
out$estimates
#> lower est upper level x1 n1 x2 n2 p1hat p2hat p1mle p2mle
#> [1,] -0.0186 0.0917 0.187 0.95 5 56 0 29 0.0893 0 0.0917 0
The underlying z-statistic is used to obtain a two-sided hypothesis
test for association (pval2sided
). Note that under certain
conditions
(i.e. )
this is equivalent to the Egon Pearson ‘N-1’ chi-squared test (Campbell 2007).
The facility is also provided for a custom one-sided test against any
specified null hypothesis value
,
e.g. for non-inferiority testing (pval_left
and
pval_right
). See the tests
vignette for more details.
out$pval
#> chisq pval2sided theta0 scorenull pval_left pval_right
#> [1,] 3.02 0.082 0 1.74 0.959 0.041
The scoreci()
function provides options to omit the
skewness correction (skew = FALSE
for the MN method) and
the variance bias correction (bcf = FALSE
for the Mee
method, which would be consistent with the Karl Pearson unadjusted
chi-squared test). To keep things simple when choosing the SCAS method,
you may instead use the scasci()
wrapper function instead
of scoreci()
:
scasci(x1 = 5, n1 = 56, x2 = 0, n = 29)$estimates
#> lower est upper level x1 n1 x2 n2 p1hat p2hat p1mle p2mle
#> [1,] -0.0186 0.0917 0.187 0.95 5 56 0 29 0.0893 0 0.0917 0
For analysis of relative risk, use:
scasci(x1 = 5, n1 = 56, x2 = 0, n = 29, contrast = "RR")$estimates
#> lower est upper level x1 n1 x2 n2 p1hat p2hat p1mle p2mle
#> [1,] 0.77 16.4 Inf 0.95 5 56 0 29 0.0893 0 0.0868 0.0053
And for odds ratio:
scasci(x1 = 5, n1 = 56, x2 = 0, n = 29, contrast = "OR")$estimates
#> lower est upper level x1 n1 x2 n2 p1hat p2hat p1mle p2mle
#> [1,] 0.759 17.7 Inf 0.95 5 56 0 29 0.0893 0 0.0865 0.00533
For analysis of Poisson incidence rates, for example
exposure-adjusted adverse event rates, the same methodology is adapted
for the Poisson distribution, with the distrib
argument.
So, for example if the denominators 56 and 29 represent the number of
patient-years at risk in each group, instead of the number of
patients:
scasci(x1 = 5, n1 = 56, x2 = 0, n = 29, contrast = "RR", distrib = "poi")$estimates
#> lower est upper level x1 n1 x2 n2 p1hat p2hat p1mle p2mle
#> [1,] 0.726 16.1 Inf 0.95 5 56 0 29 0.0893 0 0.0865 0.00539
MOVER methods
An alternative family of methods to produce confidence intervals for the same set of binomial and Poisson contrasts is the Method of Variance Estimates Recovery (MOVER), also known as Square-and-Add (For technical details see chapter 7 of Newcombe 2012). Originally labelled as a “Score” method (Newcombe 1998) due to the involvement of the Wilson Score interval, this involves a combination of intervals calculated separately for and . Newcombe based his method on Wilson intervals, but coverage can be slightly improved by using equal-tailed Jeffreys intervals instead (“MOVER-J”) (Laud 2017)3. Coverage properties remain generally inferior to SCAS - with larger sample sizes, MOVER-J has two-sided coverage close to (but slightly below) the nominal level, but central location is not achieved (except for the RR contrast).
There is no corresponding hypothesis test for the MOVER methods.
A MOVER-J interval for binomial RD from the same observed data as above would be obtained using:
moverci(x1 = 5, n1 = 56, x2 = 0, n = 29)
#> $estimates
#> lower est upper level x1 n1 x2 n2 p1hat p2hat
#> [1,] -0.00973 0.084 0.177 0.95 5 56 0 29 0.0918 0.00775
#>
#> $call
#> distrib contrast level type adj cc a1 b1
#> "bin" "RD" "0.95" "jeff" "FALSE" "FALSE" "0.5" "0.5"
#> a2 b2
#> "0.5" "0.5"
For comparison, Newcombe’s version using Wilson intervals is:
moverci(x1 = 5, n1 = 56, x2 = 0, n = 29, type = "wilson")
#> $estimates
#> lower est upper level x1 n1 x2 n2 p1hat p2hat
#> [1,] -0.0381 0.0893 0.193 0.95 5 56 0 29 0.0893 0
#>
#> $call
#> distrib contrast level type adj cc a1 b1
#> "bin" "RD" "0.95" "wilson" "FALSE" "FALSE" "0.5" "0.5"
#> a2 b2
#> "0.5" "0.5"
The MOVER approach may be further modified by adapting the Jeffreys interval (which uses a non-informative prior for the group rates) to incorporate prior information about and for an approximate Bayesian interval. For example, given data from a pilot study with 1/10 events in the first group and 0/10 in the second group, the non-informative prior distributions for and might be updated as follows:
moverbci(x1 = 5, n1 = 56, x2 = 0, n = 29, a1 = 1.5, b1 = 9.5, a2 = 0.5, b2 = 10.5)
#> $estimates
#> lower est upper level x1 n1 x2 n2 p1hat p2hat
#> [1,] 0.00917 0.0872 0.172 0.95 5 56 0 29 0.093 0.00578
#>
#> $call
#> distrib contrast level type adj cc a1 b1
#> "bin" "RD" "0.95" "jeff" "FALSE" "0" "1.5" "9.5"
#> a2 b2
#> "0.5" "10.5"
WARNING: Note that confidence intervals for ratio measures should be equivariant, in the sense that:
- interchanging the groups produces reciprocal intervals for RR or OR, i.e. the lower and upper confidence limits for vs are the reciprocal of the upper and lower limits for vs
- in addition for OR, interchanging events and non-events produces reciprocal intervals, i.e. the lower and upper confidence limits for the odds ratio of vs are the reciprocal of the upper and lower limits for vs .
Unfortunately, the MOVER method for contrast = “OR”, as described in (Morten W. Fagerland and Newcombe 2012), does not satisfy the second of these requirements.
Technical details
(Miettinen and Nurminen 1985) took the concept of the Wilson score interval for a single proportion and applied it to all contrasts of two proportions (not just RD) and Poisson rates. The procedure involves finding, at any possible value of the contrast parameter , the maximum likelihood estimates of the two proportions, restricted to the given value of . The resulting estimates are then included in the score statistic, and the confidence interval found by an iterative root-finding algorithm. Miettinen and Nurminen’s statistic was defined to follow a distribution, but the SCAS formula uses an equivalent statistic with a N(0, 1) distribution, to facilitate one-sided tests.
Gart and Nam, in a series of papers between 1985 and 1990, proposed similar confidence interval methods that included a skewness correction in the score statistic (and also a bias correction for OR). Their formulae omitted the ‘N-1’ bias correction used by Miettinen and Nurminen.
Farrington and Manning published an article on non-inferiority tests for RD and RR, using the same restricted maximum likelihood methodology as Miettinen and Nurminen (but omitting the ‘N-1’ correction).
The SCAS method combines the features of all the above methods, to give a comprehensive family of asymptotic score methods with optimised performance for universal use for confidence intervals for a wide range of sample sizes, with coherent tests for association or for non-inferiority. For further details, see (Laud 2017).