| Title: | Ebrahim-Farrington Goodness-of-Fit Test for Logistic Regression |
|---|---|
| Description: | Implements the Ebrahim-Farrington goodness-of-fit test for logistic regression models, particularly effective for sparse data and binary outcomes. This test provides an improved alternative to the traditional Hosmer-Lemeshow test by using a modified Pearson chi-square statistic with data-dependent grouping. The test is based on Farrington (1996) theoretical framework but simplified for practical implementation with binary data. Includes functions for both the original Farrington test (for grouped data) and the new Ebrahim-Farrington test (for binary data with automatic grouping), the Directed Ebrahim-Farrington (DEF) test that targets calibration-shape departures, and an ensemble that combines the DEF bases via the Cauchy combination test. For more details see Hosmer (1980) <doi:10.1080/03610928008827941> and Farrington (1996) <doi:10.1111/j.2517-6161.1996.tb02086.x>. |
| Authors: | Ebrahim Khaled Ebrahim [aut, cre] (ORCID: <https://orcid.org/0009-0006-7839-8778>) |
| Maintainer: | Ebrahim Khaled Ebrahim <[email protected]> |
| License: | GPL-3 |
| Version: | 2.0.0 |
| Built: | 2026-06-09 21:53:14 UTC |
| Source: | https://github.com/ebrahimkhaled/ebrahim.gof |
Combines the three Directed Ebrahim-Farrington (DEF) basis tests
("poly2", "poly3", "stukel") into a single goodness-of-fit
decision, so the user does not have to choose a basis. By default the p-values
are combined with the Cauchy Combination Test (CCT), which controls the error
rate under the strong dependence between tests computed on the same fitted
model. The omnibus EF test can optionally be added to the vote.
def.ensemble.gof( object, predicted_probs = NULL, X = NULL, components = c("poly2", "poly3", "stukel"), add_ef = FALSE, combine = c("cct", "minp", "fisher"), G = 10, extra_pvalues = NULL )def.ensemble.gof( object, predicted_probs = NULL, X = NULL, components = c("poly2", "poly3", "stukel"), add_ef = FALSE, combine = c("cct", "minp", "fisher"), G = 10, extra_pvalues = NULL )
object |
A fitted binary logistic |
predicted_probs |
Numeric predicted probabilities; required when
|
X |
Optional design matrix, threaded to |
components |
Character vector, a subset of |
add_ef |
Logical; if |
combine |
One of |
G |
Integer number of groups passed to |
extra_pvalues |
Optional named numeric vector of additional p-values to
include (e.g. a Tsiatis test computed elsewhere). Default |
Because the component tests are computed on the same fit, their p-values are
strongly dependent. The CCT (combine = "cct") has an asymptotic
standard-Cauchy null whose tail is robust to this dependence, so it needs no
calibration. The "minp" (Sidak) and "fisher" rules assume
independence and are offered for comparison only; under positive dependence
"minp" is conservative and "fisher" is anti-conservative, so they
should be calibrated by simulation before use (not done here).
A one-row data.frame with columns Test, Combiner,
Components, k, and p_value.
Ebrahim Khaled Ebrahim [email protected]
Liu, Y. and Xie, J. (2020). Cauchy combination test. JASA, 115(529), 393-402.
set.seed(1) n <- 500 x <- runif(n, -3, 3) y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x)))) fit <- glm(y ~ x, family = binomial()) def.ensemble.gof(fit) # CCT of the three DEF bases def.ensemble.gof(fit, add_ef = TRUE) # add the omnibus EFset.seed(1) n <- 500 x <- runif(n, -3, 3) y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x)))) fit <- glm(y ~ x, family = binomial()) def.ensemble.gof(fit) # CCT of the three DEF bases def.ensemble.gof(fit, add_ef = TRUE) # add the omnibus EF
Performs the Directed Ebrahim-Farrington (DEF) goodness-of-fit test for a fitted binary logistic regression model. DEF concentrates its power on a small set of calibration-curve "shape" directions by projecting the grouped standardized residuals onto a low-dimensional basis and testing the squared length of that projection.
def.gof( object, predicted_probs = NULL, X = NULL, G = 10, basis = c("poly3", "poly2", "stukel", "ensemble"), method = c("satterthwaite", "imhof") )def.gof( object, predicted_probs = NULL, X = NULL, G = 10, basis = c("poly3", "poly2", "stukel", "ensemble"), method = c("satterthwaite", "imhof") )
object |
A fitted binary logistic |
predicted_probs |
Numeric predicted probabilities; required when
|
X |
Optional design matrix, used only with the |
G |
Integer number of equal-frequency groups (default 10; must be >= 3). |
basis |
One of |
method |
One of |
The observations are sorted by predicted probability and split into G
equal-frequency groups; the standardized grouped residual vector is
projected onto a basis matrix of smooth shapes, giving
. Its null distribution is a weighted sum of
variables with weights equal to the eigenvalues of
, where is the
estimation-adjusted covariance of the grouped residuals. The p-value uses a
Satterthwaite scaled- approximation (default) or Imhof's method
(if the CompQuadForm package is installed). Bases: "poly2",
"poly3" (default), "stukel"; "ensemble" runs all three and
combines them via def.ensemble.gof.
A one-row data.frame with columns Test, Basis,
Test_Statistic (the statistic ), df, Method, and
p_value. When basis = "ensemble", the return is that of
def.ensemble.gof.
Ebrahim Khaled Ebrahim [email protected]
Ebrahim, K. E. and El-Kotory, A. Omnibus versus Directed Goodness-of-Fit Tests for Sparse Data in Binary Logistic Regression (companion paper).
set.seed(1) n <- 500 x <- runif(n, -3, 3) y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x)))) fit <- glm(y ~ x, family = binomial()) def.gof(fit) # default poly3 basis def.gof(fit, basis = "stukel") # tail-shape basis def.gof(fit, basis = "ensemble") # combine all three (CCT)set.seed(1) n <- 500 x <- runif(n, -3, 3) y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x)))) fit <- glm(y ~ x, family = binomial()) def.gof(fit) # default poly3 basis def.gof(fit, basis = "stukel") # tail-shape basis def.gof(fit, basis = "ensemble") # combine all three (CCT)
Performs the Ebrahim-Farrington goodness-of-fit test for logistic regression models. This test is particularly effective for binary data and sparse datasets, providing an improved alternative to the traditional Hosmer-Lemeshow test.
ef.gof( y, predicted_probs = NULL, model = NULL, m = NULL, G = 10, method = c("chisq", "normal") )ef.gof( y, predicted_probs = NULL, model = NULL, m = NULL, G = 10, method = c("chisq", "normal") )
y |
A fitted binary logistic |
predicted_probs |
Numeric vector of predicted probabilities from the
logistic regression model. Must be same length as |
model |
Optional |
m |
Optional numeric vector of trial counts for each observation (for grouped data). If NULL, data is assumed to be binary. |
G |
Optional integer specifying the number of groups for binary data
grouping. Default is 10. If NULL, no grouping is performed and |
method |
Reference distribution for the grouped EF statistic:
|
The Ebrahim-Farrington test is based on Farrington's (1996) theoretical framework but simplified for practical implementation with binary data. The test uses a modified Pearson chi-square statistic with data-dependent grouping, where observations are grouped by their predicted probabilities.
For binary data (when G is specified), the test automatically groups
observations into G groups based on predicted probabilities and applies
the simplified Ebrahim-Farrington statistic:
where is the modified Pearson chi-square statistic, and
is the number of groups.
For grouped data (when m is provided), the test applies the original
Farrington test with full variance calculations.
A data frame with the following columns:
Test |
Character string identifying the test performed |
Test_Statistic |
Numeric value of the standardized test statistic |
p_value |
Numeric p-value for the test |
For binary data with automatic grouping (G specified): Use the
Ebrahim-Farrington test which is computationally efficient and doesn't
require the model specification.
For grouped data (m provided): Use the original Farrington test
which requires the fitted model object.
The test statistic follows a standard normal distribution under the null hypothesis of adequate model fit.
For binary data with m=1 for all observations and no grouping,
the test is not applicable and will return a p-value of 1.
Ebrahim Khaled Ebrahim [email protected]
Farrington, C. P. (1996). On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data. *Journal of the Royal Statistical Society. Series B (Methodological)*, 58(2), 349-360. Ebrahim, K. E. (2025). Goodness-of-Fits Tests and Calibration Machine Learning Algorithms for Logistic Regression Model with Sparse Data. *Master's Thesis*, Alexandria University. Hosmer, D. W., & Lemeshow, S. (1980). A goodness-of-fit test for the multiple logistic regression model. *Communications in Statistics - Theory and Methods*, 9(10), 1043–1069. https://doi.org/10.1080/03610928008827941
hoslem.test for the Hosmer-Lemeshow test
# Example 1: Binary data with automatic grouping (Ebrahim-Farrington test) set.seed(123) n <- 500 x <- rnorm(n) linpred <- 0.5 + 1.2 * x prob <- 1 / (1 + exp(-linpred)) y <- rbinom(n, 1, prob) # Fit logistic regression model <- glm(y ~ x, family = binomial()) predicted_probs <- fitted(model) # Perform Ebrahim-Farrington test with 10 groups result <- ef.gof(y, predicted_probs, G = 10) print(result) # Example 2: Compare with different number of groups result_4 <- ef.gof(y, predicted_probs, G = 4) result_20 <- ef.gof(y, predicted_probs, G = 20) # Example 3: Grouped data (original Farrington test) # Note: This requires actual grouped data with trials > 1 ## Not run: # Simulated grouped data n_groups <- 50 m_trials <- sample(5:20, n_groups, replace = TRUE) x_grouped <- rnorm(n_groups) linpred_grouped <- -0.5 + 1.0 * x_grouped prob_grouped <- 1 / (1 + exp(-linpred_grouped)) y_grouped <- rbinom(n_groups, m_trials, prob_grouped) # Fit model for grouped data data_grouped <- data.frame(successes = y_grouped, trials = m_trials, x = x_grouped) model_grouped <- glm(cbind(successes, trials - successes) ~ x, data = data_grouped, family = binomial()) predicted_probs_grouped <- fitted(model_grouped) # Original Farrington test result_grouped <- ef.gof(y_grouped, predicted_probs_grouped, model = model_grouped, m = m_trials) print(result_grouped) ## End(Not run)# Example 1: Binary data with automatic grouping (Ebrahim-Farrington test) set.seed(123) n <- 500 x <- rnorm(n) linpred <- 0.5 + 1.2 * x prob <- 1 / (1 + exp(-linpred)) y <- rbinom(n, 1, prob) # Fit logistic regression model <- glm(y ~ x, family = binomial()) predicted_probs <- fitted(model) # Perform Ebrahim-Farrington test with 10 groups result <- ef.gof(y, predicted_probs, G = 10) print(result) # Example 2: Compare with different number of groups result_4 <- ef.gof(y, predicted_probs, G = 4) result_20 <- ef.gof(y, predicted_probs, G = 20) # Example 3: Grouped data (original Farrington test) # Note: This requires actual grouped data with trials > 1 ## Not run: # Simulated grouped data n_groups <- 50 m_trials <- sample(5:20, n_groups, replace = TRUE) x_grouped <- rnorm(n_groups) linpred_grouped <- -0.5 + 1.0 * x_grouped prob_grouped <- 1 / (1 + exp(-linpred_grouped)) y_grouped <- rbinom(n_groups, m_trials, prob_grouped) # Fit model for grouped data data_grouped <- data.frame(successes = y_grouped, trials = m_trials, x = x_grouped) model_grouped <- glm(cbind(successes, trials - successes) ~ x, data = data_grouped, family = binomial()) predicted_probs_grouped <- fitted(model_grouped) # Original Farrington test result_grouped <- ef.gof(y_grouped, predicted_probs_grouped, model = model_grouped, m = m_trials) print(result_grouped) ## End(Not run)
Runs several goodness-of-fit tests for a binary logistic regression in one
call and returns one tidy data.frame, one row per test. Pass a fitted
glm to run the whole battery; pass (y, predicted_probs) to run
the tests that need only predictions. Each test is wrapped so that a failure of
one test never aborts the whole run.
run.all.gof( object, predicted_probs = NULL, X = NULL, tests = "all", G = 10, include_slow = FALSE, control = list() )run.all.gof( object, predicted_probs = NULL, X = NULL, tests = "all", G = 10, include_slow = FALSE, control = list() )
object |
A fitted binary logistic |
predicted_probs |
Numeric predicted probabilities; required when
|
X |
Optional design matrix; lets the directed (DEF) tests run from the
|
tests |
Either |
G |
Integer number of groups passed to the grouping tests (default 10). |
include_slow |
Logical; when |
control |
Optional named list of per-test options (reserved). |
The currently bundled tests are: Pearson, Deviance,
Osius-Rojek, Copas-RSS, and Information-Matrix (the
White/Orme test) (global / standardized);
HL (Hosmer-Lemeshow deciles), HL-equalwidth, and
Pigeon-Heyse (partition); EF and EF-normal (the omnibus
Ebrahim-Farrington test with the chi-square and normal references; the normal
form reproduces the thesis simulation); DEF.poly2/poly3/stukel
and Stukel (directed); Tsiatis, Xie, and
Pulkstenis-Robinson (covariate-space); the two ensemble rows
(Ensemble.Vote(3DEF) and Ensemble.Univ(3DEF+EF)) from the Cauchy
combination test; and, when include_slow = TRUE, the opt-in slow tests:
le-Cessie-van Houwelingen smoothing, the GAM-based tests HL-GAM,
PR-GAM, Xie-GAM (need mgcv; fit an overfit GAM for grouping),
Stute-Zhu (a cumulative-residual parametric-bootstrap test; set the
number of reps with control = list("Stute-Zhu" = list(B = ...))),
eHL (the e-value Hosmer-Lemeshow test, reported as p = min(1, 1/e)), and
BAGofT (the binary-adaptive GOF test; needs the BAGofT package,
control = list(BAGofT = list(nsim = ...))), and Lai-Liu-HL (Lai
& Liu's standardized-power procedure for the Hosmer-Lemeshow test, which has no
p-value: it reports the standardized power as the statistic and a randomized
accept/reject decision in the Note; target size via
control = list("Lai-Liu-HL" = list(n0 = ..., k = ...))).
Notes: Tsiatis and Xie cluster the covariate space with k-means
(a fixed internal seed, so results are reproducible and the caller's RNG is
left untouched). Xie uses the corrected degrees of freedom
with the number of predictors. Pulkstenis-Robinson
auto-detects the categorical covariate (any factor/character/logical, or a
numeric with at most getOption("ebrahim.gof.pr.maxlev", 6) distinct
values); it returns NA with a note when none is present.
Every bundled test reproduces the implementation used in the original thesis
simulation: Osius-Rojek and Stukel follow LogisticDx's
gof.glm (Stukel via statmod::glm.scoretest when statmod is
installed), Copas-RSS follows rms's gof residual, HL follows
ResourceSelection::hoslem.test, and the others match their standalone
reference functions; all were checked to agree numerically.
A data.frame with columns Test, Family,
Statistic, df, p_value, and Note.
Ebrahim Khaled Ebrahim [email protected]
ef.gof, def.gof, def.ensemble.gof.
set.seed(1) n <- 500 x <- runif(n, -3, 3) y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x)))) fit <- glm(y ~ x, family = binomial()) run.all.gof(fit) # the whole battery + ensemble rows run.all.gof(fit, tests = c("EF", "DEF.poly3", "HL")) run.all.gof(y, fitted(fit)) # prediction-only testsset.seed(1) n <- 500 x <- runif(n, -3, 3) y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x)))) fit <- glm(y ~ x, family = binomial()) run.all.gof(fit) # the whole battery + ensemble rows run.all.gof(fit, tests = c("EF", "DEF.poly3", "HL")) run.all.gof(y, fitted(fit)) # prediction-only tests