Skip to contents

Generates a Cumulative Accuracy Profile (CAP) Curve to evaluate and visualize the performance of a binary classification model by comparing its cumulative accuracy against that of perfect and random models.

Usage

dx_plot_cap(dx_obj)

Arguments

dx_obj

A dx object containing rank-based statistics, including cumulative true positives and percentiles. It's expected to have a 'rank' element with the necessary data.

Value

A ggplot object representing the CAP Curve with actual, perfect, and random model lines, allowing for further customization if desired.

Details

The CAP Curve provides a visual representation of the effectiveness of a binary classification model in ranking positive instances compared to two baseline strategies: a perfect model that captures all positives immediately and a random model that captures positives uniformly at random. The curve illustrates the cumulative percentage of captured positives (Y-axis) as a function of the cumulative percentage of instances (X-axis).

The function plots three lines:

  • Actual Model: Represents the cumulative accuracy of the given model.

  • Perfect Model: Represents the hypothetical scenario where all positive instances are ranked before any negative instances.

  • Random Model: Represents the baseline scenario of randomly guessing the class.

The CAP Curve is useful for understanding not just the accuracy but also the behavior of the model across the entire range of classifications. It's particularly insightful for imbalanced datasets where positive instances are rare or more critical.

Examples

dx_obj <- dx(
  data = dx_heart_failure,
  true_varname = "truth",
  pred_varname = "predicted",
  outcome_label = "Heart Attack",
  setthreshold = .3
)
dx_plot_cap(dx_obj)