Generalized fiducial inference for logistic regression
The main function of the 'GFIlogisticRegression' package is fidSampleLR
. It simulates the fiducial distribution of the parameters of a logistic regression model.
Example
To illustrate it, we will consider a logistic dose-response model for inference on the median lethal dose. The median lethal dose (LD50) is the amount of a substance, such as a drug, that is expected to kill half of its users.
The results of LD50 experiments can be modeled using the relation
\[\textrm{logit}(p_i) = \beta_1(x_i - \mu)\]
where $p_i$ is the probability of death at the dose administration $x_i$, and $\mu$ is the median lethal dose, i.e. the dosage at which the probability of death is $0.5$. The $x_i$ are known while $\beta_1$ and $\mu$ are fixed effects that are unknown.
This relation can be written in the form
\[\textrm{logit}(p_i) = \beta_0 + \beta_1 x_i\]
with $\mu = -\beta_0 / \beta_1$.
We will perform the fiducial inference in this model with the following data:
using DataFrames
data = DataFrame(
x = [
-2, -2, -2, -2, -2,
-1, -1, -1, -1, -1,
0, 0, 0, 0, 0,
1, 1, 1, 1, 1,
2, 2, 2, 2, 2
],
y = [
1, 0, 0, 0, 0,
1, 1, 1, 0, 0,
1, 1, 0, 0, 0,
1, 1, 1, 1, 0,
1, 1, 1, 1, 1
]
)
x | y | |
---|---|---|
Int64 | Int64 | |
1 | -2 | 1 |
2 | -2 | 0 |
3 | -2 | 0 |
4 | -2 | 0 |
5 | -2 | 0 |
6 | -1 | 1 |
7 | -1 | 1 |
8 | -1 | 1 |
9 | -1 | 0 |
10 | -1 | 0 |
11 | 0 | 1 |
12 | 0 | 1 |
13 | 0 | 0 |
14 | 0 | 0 |
15 | 0 | 0 |
16 | 1 | 1 |
17 | 1 | 1 |
18 | 1 | 1 |
19 | 1 | 1 |
20 | 1 | 0 |
21 | 2 | 1 |
22 | 2 | 1 |
23 | 2 | 1 |
24 | 2 | 1 |
25 | 2 | 1 |
Let's go with $20000$ fiducial simulations:
using StatsModels, GFIlogisticRegression
fidsamples = fidSampleLR(@formula(y ~ x), data, 20000)
(Beta = 20000×2 DataFrame Row │ (Intercept) x │ Float64 Float64 ───────┼─────────────────────── 1 │ 1.03364 3.04606 2 │ 1.25911 1.87752 3 │ 1.77293 1.56713 4 │ 0.841713 1.70577 5 │ 0.799326 0.794347 6 │ 0.380466 1.75835 7 │ 0.180954 1.27897 8 │ 1.66278 0.764177 ⋮ │ ⋮ ⋮ 19994 │ -0.095358 0.647854 19995 │ -0.599387 0.274277 19996 │ 1.06014 0.666367 19997 │ 0.267742 0.657362 19998 │ 0.603148 0.650022 19999 │ 1.88123 0.966845 20000 │ 1.31249 1.34249 19985 rows omitted, Weights = [3.62769350871564e-6, 2.9731452355707228e-5, 3.2719150286775974e-5, 3.4544450402466984e-5, 3.948621495561722e-5, 8.111139070357383e-5, 5.022869913859303e-5, 4.6475037536676876e-5, 8.413937445591667e-5, 2.171685957238218e-5 … 6.839878587355508e-5, 2.8355736963279443e-5, 5.70496814853827e-5, 7.342000115900292e-5, 1.628168889657282e-5, 1.7314750095805775e-5, 1.833238871160176e-5, 2.249293442223631e-5, 2.2035005337609222e-5, 2.2459093573934344e-5])
Here are the fiducial estimates and $95\%$-confidence intervals of the parameters $\beta_0$ and $\beta_1$:
fidSummary(fidsamples)
variable | mean | median | lwr | upr | |
---|---|---|---|---|---|
String | Float64 | Float64 | Float64 | Float64 | |
1 | (Intercept) | 0.566678 | 0.542482 | -0.454585 | 1.7196 |
2 | x | 0.946116 | 0.907732 | 0.163599 | 1.95445 |
Now let us draw the fiducial $95\%$-confidence interval about our parameter of interest $\mu$:
fidConfInt("-:\"(Intercept)\" ./ :x", fidsamples, 0.95)
(lower = -3.230485923227428, upper = 0.73469846184404)
Member functions
GFIlogisticRegression.fidConfInt
— FunctionfidConfInt(parameter, fidsamples, conf)
Fiducial confidence interval of a parameter of interest.
Arguments
parameter
: an expression of the parameter of interest given as a string; see the examplefidsamples
: an output offidSampleLR
conf
: confidence level
Example
using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
y = [0, 0, 1, 1, 1, 1],
group = ["A", "A", "A", "B", "B", "B"]
)
fidsamples = fidSampleLR(@formula(y ~ 0 + group), data, 3000)
fidConfInt(":\"group: A\" - :\"group: B\"", fidsamples, 0.95)
GFIlogisticRegression.fidProb
— MethodfidProb(parameter, fidsamples, q)
Fiducial non-exceedance probability of a parameter of interest.
Arguments
parameter
: an expression of the parameter of interest given as a string; see the examplefidsamples
: fiducial simulations, an output offidSampleLR
q
: the non-exceedance threshold
Example
using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
y = [0, 0, 1, 1, 1],
x = [-2, -1, 0, 1, 2]
)
fidsamples = fidSampleLR(@formula(y ~ x), data, 3000)
fidProb("map(exp, :x)", fidsamples, 1) # this is Pr(exp(x) <= 1)
GFIlogisticRegression.fidQuantile
— MethodfidQuantile(parameter, fidsamples, p)
Fiducial quantile of a parameter of interest.
Arguments
parameter
: an expression of the parameter of interest given as a string; see the examplefidsamples
: an output offidSampleLR
p
: quantile level, between 0 and 1
Example
using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
y = [0, 0, 1, 1, 1, 1],
group = ["A", "A", "A", "B", "B", "B"]
)
fidsamples = fidSampleLR(@formula(y ~ 0 + group), data, 3000)
fidQuantile(":\"group: A\" ./ :\"group: B\"", fidsamples, 0.5)
GFIlogisticRegression.fidSampleLR
— FunctionfidSampleLR(formula, data, N[, gmp][, thresh])
Fiducial sampling of the parameters of the logistic regression model.
Arguments
formula
: a formula describing the modeldata
: data frame in which the variables of the model can be foundN
: number of simulationsgmp
: whether to use exact arithmetic in the algorithmthresh
: the threshold used in the sequential sampler; the defaultN/2
should not be changed
Example
using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
y = [0, 0, 1, 1, 1],
x = [-2, -1, 0, 1, 2]
)
fidsamples = fidSampleLR(@formula(y ~ x), data, 3000)
GFIlogisticRegression.fidSummary
— MethodfidSummary(fidsamples)
Summary of the fiducial simulations.
Argument
fidsamples
: an output offidSampleLR
Example
using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
y = [0, 0, 1, 1, 1],
x = [-2, -1, 0, 1, 2]
)
fidsamples = fidSampleLR(@formula(y ~ x), data, 3000)
fidSummary(fidsamples)