Skip to contents

Conducts a Chi-square test of independence on a 2x2 confusion matrix derived from binary classification results, assessing whether the observed frequency distribution differs from the expected distribution.

Usage

dx_chi_square(cm, detail = "full")

Arguments

cm

A dx_cm object created by dx_cm().

detail

Character specifying the level of detail in the output: "simple" for raw estimate, "full" for detailed estimate including 95% confidence intervals.

Value

Depending on the detail parameter: - if "simple": a single numeric value representing the p-value of the Chi-square test. - if "full": a data frame with the Chi-square test result, including the p-value and method note.

Details

The Chi-square test is used to determine whether there is a significant association between the predicted and actual binary classifications. It compares the observed frequencies in each cell of the table to the frequencies expected if the rows and columns are independent. A low p-value indicates that the distributions of actual and predicted classifications are not independent, suggesting a significant association between them. The function uses Pearson's Chi-squared test with Yates' continuity correction by default, which is more accurate for small sample sizes. The test is most appropriate when each cell in the 2x2 table has an expected frequency of 5 or more.

See also

dx_cm() for creating a 'dx_cm' object.

Examples

cm <- dx_cm(dx_heart_failure$predicted, dx_heart_failure$truth,
  threshold = 0.3, poslabel = 1
)
simple <- dx_chi_square(cm, detail = "simple")
detailed <- dx_chi_square(cm)
print(simple)
#> [1] 5.450633e-21
print(detailed)
#> # A tibble: 1 × 8
#>   measure           summary estimate conf_low conf_high fraction conf_type notes
#>   <chr>             <chr>      <dbl> <lgl>    <lgl>     <chr>    <chr>     <chr>
#> 1 Pearson's Chi-sq… p<0.01  5.45e-21 NA       NA        ""       ""        Pear…