DIAGNOSTIC PERFORMANCE ANALYSIS (PART 1)
The
primary purpose of a screening test is to rule out a disease or a clinical
condition of interest to put less demand on the
healthcare system and less discomforting for patients as it is usually less
invasive, less dangerous, user-friendly and inexpensive. The question is, how do we know that a screening test is reliable and, therefore, safely
applicable in clinical settings?
The short answer is the predictive ability of a screening test must be inspected to what degree it corresponds to the reference standard to diagnose the condition. For example, if the hypothesis is testing the reliability of ECG to diagnose a STEMI, the ECG features must be analysed in relation to the angiogram outcome, which is considered the reference standard test to detect a complete obstruction of the coronary vessel. In other words, the ‘reference standard’ represents a test that provides authoritative and definitive proof that a condition of interest is present. The typical analysis to determine the reliability of the screening test is characterised below. Based on the screening result, which is typically categorized as a dichotomous outcome, ie. A positive or negative test and the reference standard which confirms the presence or absence of the condition, tested subjects are assigned to one of the four cells labelled A, B, C and D. Based on the count in each cell, the value for SN, SP, NPV and PPV can be measured and usually expressed as percentages.
|
Has the
condition |
Does not
have the condition |
|
Positive
Test |
A (True
Positive) |
B (False
Positive) |
A + B |
Negative
Test |
C (False
Negative) |
D (True
Negative) |
C + D |
|
A + C |
B + D |
A + B + C +
D |
Sensitivity
The ability of
the test to detect a true positive reflects the test’s ability to
correctly identify all people who have the condition [A/ (A + C)] x 100].
Sensitivity
(SN) is defined as the probability of a screening test to detect patients who
have the disease of interest solely among all the patients who truly have the
disease. Based on the contingency table, the formula for SN is [A/(A+C)] x 100. A refers to the True Positive rate, and C refers to the False Negative rate. In
other words, SN is the ability of the test to detect the true positive cases. The higher the proportion of true positive cases correctly identified by the
test, the higher the sensitivity. In the clinical context, higher sensitivity
reflects a superior ability to rule out a disease relative to tests with
lower sensitivity. In other words, the higher the SN value of a test, the lower
the rate of detecting false negative cases. False negative means the test is
negative in patients who actually have the disease, ie. Type II error. Type II
error in statistical terms refers to failure to reject the null hypothesis when
it is false. Taken together, if a highly
sensitive test comes back negative, one can be assured that the disease is
ruled out.
It is also important to note that the interpretation of sensitivity is not a stand-alone concept. One must take into consideration the specificity (SP) and predictive values (PV) of the test as well. It is the balance between SN and SP, alongside the negative or positive predictive values, that provides clinicians with the overall picture of the diagnostic ability of the test.
Specificity
The ability of
the test to detect a true negative reflects the test’s ability to
correctly identify all people who do not have the condition [D/ (B + D )] x 100.
Specificity (SP) is defined as the probability of the screening test correctly detecting patients who do not have the disease of interest solely among patients who truly do not have the disease. It is the ability of the test to detect true negative cases. Based on the table below, it is the proportion of true negative cases (D) over the combination of true negative and false positive cases (B) (D/B+D). In other words, the higher the SP of a test, the lower the rate of false positive cases being detected. False positive means the test gives a positive result, although, in reality, the patient does not have the disease. From a statistical perspective, a false positive rate is also known as a Type I error. A clinical test with a high rate of false positivity or Type I error is not ideal as it can lead to unnecessary costly investigations and create a false alarm to patients which can be psychologically traumatizing.
For example, a test with
65% SP was used to diagnose breast cancer, and the test came back positive. Before concluding whether the patient really has cancer, the treating physician
must be aware that the rate of false positives with this test is relatively high, ie. 35%. Decisions must be made whether to prematurely break the news to the
patient or subject the patient to another test with a higher SP. Hypothetically
speaking, assuming a test is 100% specific, although in reality, it is almost
impossible to have a perfect test, it means the test detects all true negative
cases and no false positive cases. From a clinical standpoint, this means that if a
test comes back positive, we can be sure that the patient really has the
disease. Therefore, a highly specific test is a great tool to rule in a
disease.
Again, it is important to reiterate that there is no perfect test. There is always an overlap between the true
negative cases and true positive cases, which gives rise to the grey area that encompasses the
false negative and false positive, depending on the threshold level. Based on the figure, the
The blue curve reflects the true negative cases, and the red curve reflects the true positive cases.
The black line denoted as A is the threshold for the test that detects all the true positive cases. i.e. 100%
sensitivity. However, one can appreciate the substantial area of overlap with the blue curve, which
represents the proportion of false positive cases. Similarly, the line denoted B is the threshold to detect
all the true negative cases, i.e. 100% specificity. Although all the true negative cases are
being detected at this threshold, a substantial area of the red curve is also included, which represents
the proportion of false negative cases. Therefore, it is essential to select an appropriate threshold for
each test to balance the risk of false negatives and positives. False positives may lead to unnecessary and
costly downstream investigations or risky medical management, whereas false negatives may give a
false sense of security to patients leading to undiagnosed medical conditions and, in some countries,
risks of medical litigation. I hope the explanation helps.
Comments
Post a Comment