NPC Archive Item: Independent validation of QRISK supports its superiority over Framingham in the UK population

NOTE – This is an archive post from the NPC and has not been updated since first publication. Therefore, some hyperlinks may no longer be working.
MeReC Rapid Review NPC Logo

21st July 2009

This independent validation of the QRISK tool found that QRISK showed better calibration and better discrimination in predicting CV risk than the Framingham-based tool currently recommended by NICE. Based on this, using QRISK would lead to fewer patients being incorrectly categorised as “high risk” and would target more patients who would benefit from treatment.

Level of evidence:
Level 2 (limited quality patient-oriented evidence) according to the SORT criteria.

NICE currently recommends using a Framingham-based tool to assess CV risk. Should NICE decide to change its recommendations, appropriate training and support will be needed to ensure this is used correctly. In the meantime, it is important that health professionals continue to use a validated risk assessment tool as a basis for discussion with patients, and not treat patients on the basis of individual risk factors (e.g. an isolated high blood pressure or isolated high cholesterol).

What is the background to this?
NICE guidance on cardiovascular (CV) risk assessment and lipid modification recommends that statins should be offered for the primary prevention of CV disease (CVD) in adults who have a 20% or greater 10-year risk of developing CVD. Predicted 10-year risk also influences decisions about other treatments, such as management of hypertension. However, if the prediction tool we use either overpredicts or underpredicts risk we could end up over- or under-treating lots of people.

The prediction tools currently recommended by NICE use data from the Framingham study – a population study in a mainly white, affluent population in New England measured at a time when CVD was at its peak in the USA. Data from this study are the basis of, for example, the tables printed in the BNF. However, the population of patients on which these tools were derived is perhaps different from the UK population. QRISK is a new CVD risk scoring system that was specifically developed for the UK. It is based on a large UK general practice database (QRESEARCH).

The two main measures by which a risk prediction tool should be judged are calibration and discrimination. Calibration relates to how close the predicted risk is to the observed risk.  More importantly, discrimination is the ability of the tool to differentiate between people who will have an event and those who will not, over a defined period of time (often five to ten years). In other words, when a risk prediction tool dichotomises people as being either “lower risk” or “higher risk”, how well does this predict what actually happens over the stated time period?

A validation study of QRISK was first published in 2007. It seemed to show that QRISK was better than Framingham at predicting CV risk in a random sample (a third) of the QRESEARCH cohort (the remaining two-thirds of the cohort had been used to derive the QRISK score).  As we have previously blogged, a second validation study, based on a different UK general practice database (THIN), also seemed to show that QRISK was better than Framingham at predicting risk in a UK population. These two studies were conducted by the originators of QRISK.  The Department of Health commissioned a validation of QRISK in the THIN population by independent, external medical statisticians, who have an extremely high reputation.

The authors calculated the CV risk of all people in the THIN database using QRISK and two versions of the Framingham equation: the one currently recommended by NICE and one which is sex-specific. These predictions were then compared with what actually happened to those people when they were followed-up.

What does this study claim?
The researchers found that QRISK gives a more accurate estimate of predicted risk compared with either Framingham equation, i.e. it showed better calibration and better discrimination (see study details for more information). This is true for all age groups and both sexes except for women aged 60 to 69 years.

QRISK underpredicted CV events and the Framingham variants overpredicted risk overall (except in women aged 65 to 69 and especially in women aged 70 to 74). However, the under prediction with QRISK was smaller than the over prediction with Framingham. Framingham would have identified about 20% of the male population and 5% of the female population as being at higher risk, compared with 10% and 4% respectively with QRISK. However, the overall CV risk of people categorised by QRISK as being in the higher risk group was greater than the group categorised as such by Framingham. Taken altogether, QRISK’s use would target more patients who would benefit from treatment and falsely label fewer patients as higher risk.

So what?
As we have previously blogged, the authors of QRISK have updated the QRISK tool that was used in this study, to include additional predictors. Their validation study also found it to be better than the NICE-recommended Framingham model at predicting risk. As an editorial related to the study we blog here notes, the independent authors were not able to use the updated QRISK tool (QRISK2), but taken all together, it seems that QRISK (or QRISK2) “has the edge” on Framingham-derived tools. NICE will be looking at these data. If it decides to change its recommendations, we shall update our materials appropriately. In the meantime, the most important thing is to use a validated tool correctly as a basis for discussion with patients and not to treat on the basis of individual risk factors (eg an isolated high blood pressure or isolated high cholesterol). As the NICE guidance says “estimates of CVD risk derived from equations are not an exact science but are better than clinical judgment alone for the estimation of CVD risk”. Of course, health professionals need to take into account patient circumstances and wishes.  It would be foolish to have an iron rule that (whatever tool is used) someone with a 19.9% predicted risk can never receive prophylaxis, but someone with a 20.1% risk must always receive prophylaxis.

The editorial highlights a paradox associated with any risk prediction tool. An individual person at higher risk is more likely to develop CVD than another individual at lower risk: however, there are many more people at lower risk than higher risk, so looking at a population level, most CVD develops in the lower risk group. In fact, 70% CV events in men and 82% of CV events in women in the THIN cohort occurred in people who were lower risk as defined by QRISK. But trying to prevent more CVD by lowering the threshold (say, to 15% 10 year risk or lower) would mean more and more people would be treated for less and less incremental benefit, but with greater cost per event saved, less chance of any individual benefiting and more people being exposed to treatment and therefore risk of side effects. Thus, making treatment policy recommendations at a population level are complicated and difficult.

More information on cardiovascular risk assessment can be found on the relevant section of NPC.

Study details

Colins G and Altman D An independent external validation and evaluation of QRISK cardiovascular risk prediction: a prospective open cohort study. BMJ 2009;339:b2584, published online 7th July 2009

Patients: 1,072,800 patients, registered between 1 January 1995 and 1 April 2006, aged 35-74 years (5.4 million person years) with 43,990 CV events.

Intervention and comparison: QRISK tool compared with two Framingham-based tools: the Anderson Framingham tool (as recommended by NICE at the time) and the Cox Framingham tool, a sex-specific variant.

Outcomes and results. Discrimination and calibration statistics were better with QRISK. QRISK explained 32% of the variation in men and 37% in women, compared with 27% and 31% respectively for Anderson Framingham. The D discrimination statistic (where a higher value represents better discrimination) was higher (by more than 0.1) in both men and women for QRISK (1.39 and 1.56 respectively) than for the Anderson Framingham equation (1.26 and 1.38 respectively), indicating poorer discrimination of the Anderson Framingham equation. The Brier score, was lower (that is, more accurate) for QRISK in men (0.0470) compared with the Anderson Framingham equation (0.0545). Similarly, for women, the Brier score was lower for QRISK (0.0321) compared with the Anderson Framingham equation (0.0334). QRISK underpredicted risk by 13% for men and 10% for women, whereas Anderson Framingham overpredicted risk by 32% for men and 10% for women.

Of the 1,072,800 people, 85,010 (8%) would be reclassified with QRISK: 57,199 men and 13,566 women would be moved from the higher risk to the lower risk group; and 3,548 men and 10,697 women would be moved from the lower risk to the higher risk group.  However, the incidence of CV events among men designated “higher risk” by QRISK was 30.5 per 1000 patient years (95% Confidence Interval [CI] 29.9 to 31.2) whereas it was 23.7 per 1000 person years (95%CI 23.2 to 24.1) with the Framingham equation. Similarly, for women identified as high risk with QRISK, the incidence rate of CV events was 26.7 per 1000 person years (95%CI 25.8 to 27.7) and was 22.2 per 1000 person years (95%CI 21.4 to 23.0) with Framingham.

Sponsorship: This study was commissioned by the Department of Health. The funder had no role in the study design, analysis, or interpretation or writing of the manuscript.

Please comment on this blog in the NPC discussion rooms, or using our feedback form.

Make sure you are signed up to NPC Email updates — the free email alerting system that keeps you up to date with the NPC news and outputs relevant to you.