Compares multiple classification models pairwise using various statistical tests to assess differences in performance metrics. It supports both paired and unpaired comparisons.
Arguments
- dx_list
A list of
dxobjects representing the models to be compared. Eachdxobject should be the result of a call todx().- paired
Logical, indicating whether the comparisons should be treated as paired. Paired comparisons are appropriate when models are evaluated on the same set of instances (e.g., cross-validation or repeated measures).
Value
A dx_compare object containing a list of dx objects and a data frame of
pairwise comparison results for each test conducted.
Details
This function is a utility to perform a comprehensive comparison between
multiple classification models. Based on the value of paired, it will
perform appropriate tests. The resulting object can be used it further
functions like dx_plot_rocs.
See also
dx_delong(), dx_z_test(), dx_mcnemars()
for more details on the tests used for comparisons.
Examples
dx_glm <- dx(data = dx_heart_failure, true_varname = "truth", pred_varname = "predicted")
dx_rf <- dx(data = dx_heart_failure, true_varname = "truth", pred_varname = "predicted_rf")
dx_list <- list(dx_glm, dx_rf)
dx_comp <- dx_compare(dx_list, paired = TRUE)
print(dx_comp$tests)
#> # A tibble: 2 × 9
#> models test summary p_value estimate conf_low conf_high statistic notes
#> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 Model 1 vs.… DeLo… 0.04 (… 5.89e-4 "0.0413… 0.0178 0.0649 3.44 ""
#> 2 Model 1 vs.… McNe… p=0.02 1.80e-2 "" NA NA 5.6 ""
