Generalized fiducial inference for logistic regression

The main function of the 'GFIlogisticRegression' package is fidSampleLR. It simulates the fiducial distribution of the parameters of a logistic regression model.

Example

To illustrate it, we will consider a logistic dose-response model for inference on the median lethal dose. The median lethal dose (LD50) is the amount of a substance, such as a drug, that is expected to kill half of its users.

The results of LD50 experiments can be modeled using the relation

\[\textrm{logit}(p_i) = \beta_1(x_i - \mu)\]

where $p_i$ is the probability of death at the dose administration $x_i$, and $\mu$ is the median lethal dose, i.e. the dosage at which the probability of death is $0.5$. The $x_i$ are known while $\beta_1$ and $\mu$ are fixed effects that are unknown.

This relation can be written in the form

\[\textrm{logit}(p_i) = \beta_0 + \beta_1 x_i\]

with $\mu = -\beta_0 / \beta_1$.

We will perform the fiducial inference in this model with the following data:

using DataFrames
data = DataFrame(
  x = [
    -2, -2, -2, -2, -2,
    -1, -1, -1, -1, -1,
     0,  0,  0,  0,  0,
     1,  1,  1,  1,  1,
     2,  2,  2,  2,  2
  ],
  y = [
    1, 0, 0, 0, 0,
    1, 1, 1, 0, 0,
    1, 1, 0, 0, 0,
    1, 1, 1, 1, 0,
    1, 1, 1, 1, 1
  ]
)

25 rows × 2 columns

	x	y
	Int64	Int64
1	-2	1
2	-2	0
3	-2	0
4	-2	0
5	-2	0
6	-1	1
7	-1	1
8	-1	1
9	-1	0
10	-1	0
11	0	1
12	0	1
13	0	0
14	0	0
15	0	0
16	1	1
17	1	1
18	1	1
19	1	1
20	1	0
21	2	1
22	2	1
23	2	1
24	2	1
25	2	1

Let's go with $20000$ fiducial simulations:

using StatsModels, GFIlogisticRegression
fidsamples = fidSampleLR(@formula(y ~ x), data, 20000)

(Beta = 20000×2 DataFrame
   Row │ (Intercept)  x
       │ Float64      Float64
───────┼───────────────────────
     1 │    1.03364   3.04606
     2 │    1.25911   1.87752
     3 │    1.77293   1.56713
     4 │    0.841713  1.70577
     5 │    0.799326  0.794347
     6 │    0.380466  1.75835
     7 │    0.180954  1.27897
     8 │    1.66278   0.764177
   ⋮   │      ⋮          ⋮
 19994 │   -0.095358  0.647854
 19995 │   -0.599387  0.274277
 19996 │    1.06014   0.666367
 19997 │    0.267742  0.657362
 19998 │    0.603148  0.650022
 19999 │    1.88123   0.966845
 20000 │    1.31249   1.34249
             19985 rows omitted, Weights = [3.62769350871564e-6, 2.9731452355707228e-5, 3.2719150286775974e-5, 3.4544450402466984e-5, 3.948621495561722e-5, 8.111139070357383e-5, 5.022869913859303e-5, 4.6475037536676876e-5, 8.413937445591667e-5, 2.171685957238218e-5  …  6.839878587355508e-5, 2.8355736963279443e-5, 5.70496814853827e-5, 7.342000115900292e-5, 1.628168889657282e-5, 1.7314750095805775e-5, 1.833238871160176e-5, 2.249293442223631e-5, 2.2035005337609222e-5, 2.2459093573934344e-5])

Here are the fiducial estimates and $95\%$-confidence intervals of the parameters $\beta_0$ and $\beta_1$:

fidSummary(fidsamples)

2 rows × 5 columns

	variable	mean	median	lwr	upr
	String	Float64	Float64	Float64	Float64
1	(Intercept)	0.566678	0.542482	-0.454585	1.7196
2	x	0.946116	0.907732	0.163599	1.95445

Now let us draw the fiducial $95\%$-confidence interval about our parameter of interest $\mu$:

fidConfInt("-:\"(Intercept)\" ./ :x", fidsamples, 0.95)

(lower = -3.230485923227428, upper = 0.73469846184404)

Member functions

GFIlogisticRegression.fidConfInt — Function

fidConfInt(parameter, fidsamples, conf)

Fiducial confidence interval of a parameter of interest.

Arguments

parameter: an expression of the parameter of interest given as a string; see the example
fidsamples: an output of fidSampleLR
conf: confidence level

Example

using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
  y = [0, 0, 1, 1, 1, 1],
  group = ["A", "A", "A", "B", "B", "B"]
)
fidsamples = fidSampleLR(@formula(y ~ 0 + group), data, 3000)
fidConfInt(":\"group: A\" - :\"group: B\"", fidsamples, 0.95)

source

GFIlogisticRegression.fidProb — Method

fidProb(parameter, fidsamples, q)

Fiducial non-exceedance probability of a parameter of interest.

Arguments

parameter: an expression of the parameter of interest given as a string; see the example
fidsamples: fiducial simulations, an output of fidSampleLR
q: the non-exceedance threshold

Example

using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
  y = [0, 0, 1, 1, 1],
  x = [-2, -1, 0, 1, 2]
)
fidsamples = fidSampleLR(@formula(y ~ x), data, 3000)
fidProb("map(exp, :x)", fidsamples, 1) # this is Pr(exp(x) <= 1)

source

GFIlogisticRegression.fidQuantile — Method

fidQuantile(parameter, fidsamples, p)

Fiducial quantile of a parameter of interest.

Arguments

parameter: an expression of the parameter of interest given as a string; see the example
fidsamples: an output of fidSampleLR
p: quantile level, between 0 and 1

Example

using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
  y = [0, 0, 1, 1, 1, 1],
  group = ["A", "A", "A", "B", "B", "B"]
)
fidsamples = fidSampleLR(@formula(y ~ 0 + group), data, 3000)
fidQuantile(":\"group: A\" ./ :\"group: B\"", fidsamples, 0.5)

source

GFIlogisticRegression.fidSampleLR — Function

fidSampleLR(formula, data, N[, gmp][, thresh])

Fiducial sampling of the parameters of the logistic regression model.

Arguments

formula: a formula describing the model
data: data frame in which the variables of the model can be found
N: number of simulations
gmp: whether to use exact arithmetic in the algorithm
thresh: the threshold used in the sequential sampler; the default N/2 should not be changed

Example

using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
  y = [0, 0, 1, 1, 1],
  x = [-2, -1, 0, 1, 2]
)
fidsamples = fidSampleLR(@formula(y ~ x), data, 3000)

source

GFIlogisticRegression.fidSummary — Method

fidSummary(fidsamples)

Summary of the fiducial simulations.

Argument

fidsamples: an output of fidSampleLR

Example

using GFIlogisticRegression, DataFrames, StatsModels
data = DataFrame(
  y = [0, 0, 1, 1, 1],
  x = [-2, -1, 0, 1, 2]
)
fidsamples = fidSampleLR(@formula(y ~ x), data, 3000)
fidSummary(fidsamples)

source