Or: How to tell if a test is helpful or not.
TL;DR: A really good, accurate test has a ROC line that hugs the upper left corner of the graph and has an AUC very close to 1.0, and a worthless one has an AUC of 0.5.
I want to give you a simple way to tell if the scores and tests that you rely on (and many of which we publish on MDCalc) are good — and how good they are at separating patients with the disease you’re worried about from those without having the disease you’re worried about.
That simple way is called the Area Under Curve (AUC), or the c-statistic, and you get it from the Receiver Operating Curve (ROC). We’ll talk about the ROC curves you might see in papers, but first we have to go back to diseases, testing, sensitivity, and specificity.
We all know that sensitivity and specificity are almost always at odds. In almost all diseases, there’s some overlap in patients between health and disease when we try to apply a test to them. If we tried to make a rule for myocardial infaraction based only on “Does the patient have chest pain?” we know that many patients with myocardial infarction — but not all — have chest pain. So we’re going to miss some patients with MI if they don’t have chest pain, using that simple rule.
This graph summarizes this well:
So what we really want to know is: If I’m going to a use a test to determine if someone has a disease I’m worried about, is that a good test? And that’s called accuracy. Accuracy says how well a test separates people into groups with the disease, and groups without the disease.
Would “Does the patient have chest pain?” be a good test for myocardial infarction? No, of course not. Because it doesn’t separate people into “Having MI” and “Not having MI” very well.
But there’s lots of other tests for myocardial infarction. How bad is “Does the patient have chest pain?” compared to other tests? And that’s where the ROC and the AUC come in. They let you compare and objectify how good or how bad two diagnostic tests are (how accurate they are).
One final issue: to use these tests, you have to have a continuous outcome (so “Does the patient have chest pain, “Yes/No”) actually wouldn’t work, but “How bad is your chest pain, on a scale of 0-10?” would work just fine. (One way people get around this with labs that use cut-offs is to run the numbers with multiple cut-offs: Lactate <2, Lactate 2-4, or Lactate >4, for example.)
The ROC plots true positives against false positives. Y Axis: True Positives. X axis: False Positives. You want lots of the former and none of the latter, so if you just plot these out at different cutoffs or levels, you get points on the graph. Connect those points, and that makes the curve. That’s it.
Let’s say you’re looking at troponin for diagnosing myocardial infarction. If a cutoff of 0.01 has mostly false positives and few true positives, it’s really sensitive but not very specific at all.
A cutoff of 0.5 is going to be less sensitive but more specific:
And a troponin of 25 is very specific but not very sensitive. Or: it’s really rare to have a false negative with a troponin of 25, but it’s going to miss a lot of the true positives if your cutoff is 25, too.
Now’s let take it one step further: if you calculate how much area on the graph is under the curve, that’s the AUC (area under curve). And the AUC lets you compare tests easily by seeing how much area each test takes up on that standard graph.
Here’s a rough way of categorizing AUCs, which range from 0.5 – 1.
- 0.90-1.0 = Excellent Test and Accuracy
- 0.80-0.90 = Good Test and Accuracy
- 0.70-0.80 = Fair Test and Accuracy
- 0.60-0.70 = Poor Test and Accuracy
- 0.50-0.60 = Failed Test and Accuracy
For you visual learners, we’ve got a chart! Let’s look at a few tests for diagnosing myocardial infarction:
- Worthless Test: “How Bad Does Your Ankle Hurt?”
- Slightly Better Test: “How Bad Does Your Chest Hurt?”
- Better Test: “How Bad Does Your Chest Hurt And Is Your EKG Concerning for Heart Attack?”
- Good Test: “Is Your EKG Concerning and What is Your Troponin Level?”
- Very Good Test: “Is Your EKG Concerning and What is Your Troponin Level and Repeat Troponin Level at 6 Hours?”
And each curve for each test:
And now, each area for each test:
Hopefully we’ve shed some light on what can often be a pretty confusing topic. Our goal is to start documenting and categorizing AUCs for tests for calculators on MDCalc, so that we can compare apples to apples when users are trying to evaluate how accurate a test on the site is.
Next up: The Problems of the Gold Standard!
Looking for more? The University of Nebraska Medical Center has a great overview of ROCs and AUCs and Rahul Patwari has an excellent Youtube video: