How are you taking part in this consultation?

You will not be able to change how you comment later.

You must be signed in to answer questions

  • Question on Consultation

    Has all of the relevant evidence been taken into account?
  • Question on Consultation

    Are the summaries of clinical and cost effectiveness reasonable interpretations of the evidence?
  • Question on Consultation

    Are the recommendations sound and a suitable basis for guidance to the NHS?
  • Question on Consultation

    Are there any equality issues that need special consideration and are not covered in the medical technology consultation document?
The content on this page is not current guidance and is only for the purposes of the consultation process.

3 Committee discussion

The diagnostics advisory committee considered evidence on artificial intelligence (AI) technologies for the opportunistic detection of vertebral fragility fractures (VFFs) from several sources. This included evidence submitted by the companies, a review of clinical and cost evidence by the external assessment group (EAG), a resource impact assessment by NICE and responses from stakeholders. Full details are available in the project documents for this guidance.

The condition

3.1

A VFF is a break in the spine that happens when bones are weaker than normal. VFFs can happen after a fall from standing height or lower (low energy trauma) or spontaneously from day-to-day activities involving very little trauma or stress. They are the most common type of fragility fracture caused by osteoporosis, which reduces bone density and strength. Osteoporotic VFFs are common in older people and particularly in women, trans men and non-binary people after menopause. But, they can also be associated with other conditions or factors, such as chronic or long-term corticosteroid or glucocorticoid use or malignancy in the vertebrae. Other risk factors include:

  • a history of falls

  • family history of hip fracture

  • low body mass index

  • smoking

  • alcohol intake and

  • secondary causes of osteoporosis, such as:

    • rheumatoid arthritis

    • inflammatory bowel disease or

    • malabsorption.

      Patient experts explained that vertebral fractures can be life changing and emotionally challenging.

Current practice

3.2

VFFs can be identified when a person presents to a healthcare setting with symptoms that suggest a VFF. But, VFFs can also be detected incidentally on radiographic images that include the spine but that were taken for reasons other than a suspected VFF. This is known as opportunistic detection. Clinical experts explained that there is no clear pathway for people with VFFs and there is variation across the NHS. Where available, people are referred to fracture liaison services.

Unmet need

3.3

People with a VFF often experience deformity, height loss, immobility and pain, which leads to reduced quality of life. The risk of death is also higher. VFFs are also a strong predictor of further osteoporotic fractures, such as hip fractures. The economic cost of fractures to the NHS is substantial. There are effective pharmacological and non-pharmacological treatment options for managing symptomatic VFFs. Treatment can also reduce the risk of further fractures.

3.4

Thousands of radiographic images are taken annually for reasons other than VFF detection. These could be used to opportunistically detect VFFs. But the clinical experts noted that despite ongoing efforts to raise awareness of VFFs, most remain undiagnosed. The clinical and patient experts stressed that improving detection and treatment offers a significant opportunity to reduce the burden of VFFs and reduce the risk of further fractures. The committee concluded that there is an unmet clinical need that can be addressed by AI technologies.

Innovative aspects

3.5

The technologies use AI to detect vertebral fractures. This could improve VFF detection rates, leading to more people getting the care they need. The clinical experts highlighted that many AI technologies have been adopted across the NHS and that AI offers significant potential for improving care.

3.6

All of the identified technologies have algorithms that are fixed. Four companies (Aidoc Medical, Annalise.AI, Nanox AI and Avicenna.AI) have said that their technologies have settings to control the AI software's sensitivity and specificity, which are configured at set up or during use. This can help tailor the performance based on a hospital or centre's needs. Some of the technologies include additional features, for example for triage or prioritisation.

Clinical effectiveness

Evidence base

3.7

There were 22 studies that met the inclusion criteria for the EAG's clinical-effectiveness review. Most studies evaluated HealthVCF (8 studies), Annalise Enterprise CXR/Annalise Container CXR (5 studies), IB Lab FLAMINGO (4 studies) or CINA-VCF Quantix (3 studies). There was 1 study each on BoneView and BriefCase-Triage. Most studies included diagnostic accuracy as an outcome and were retrospective. Ten of the studies reported the technologies' failure rates. Other relevant outcomes were reported in a minority of studies.

Diagnostic accuracy

3.8

Diagnostic accuracy evidence was available for all of the technologies except TechCare Spine and HealthOST. The majority of the diagnostic accuracy evidence compared the performance of the technologies against a reference standard. Most of the studies demonstrated high sensitivity and specificity for detecting moderate and severe vertebral fractures. Twelve studies on 4 technologies also compared the AI software's performance against standard care (most commonly the original radiology report) but most of these were not done in the UK. These studies suggest that the AI software could improve the detection rate of VFFs compared with standard care. But the committee noted that these studies may not reflect standard care in the NHS. The clinical experts commented that in their experience, the detection rate in the UK is low and that some data from UK prevalence studies and databases suggests that a lot of VFFs remain undiagnosed. The committee judged that the evidence on the diagnostic accuracy of standard care in the NHS is very limited and uncertain but that the technologies are likely to improve detection rates.

3.9

The EAG also highlighted that the reference standards varied across the studies and that some may not reflect NHS practice. A specialist committee member explained that the reference standard in the NHS would be at least 1 radiologist reviewing the radiograph specifically looking for a VFF. The committee judged the evidence from non-UK studies versus a reference standard to still be informative for the diagnostic accuracy of the AI technologies.

3.10

Most of the evidence is retrospective. The EAG explained that a retrospective study design is appropriate for assessing a technology's diagnostic accuracy because of the risk of participation bias with prospective studies. But, prospective evidence would be better suited to show the impact of the technologies on other outcomes, such as changes to clinical management. The committee agreed that retrospective evidence in this case was appropriate for assessing the diagnostic accuracy of the technologies.

3.11

The committee concluded that overall, the evidence suggests that the AI technologies can detect additional moderate to severe vertebral fractures (as confirmed by a reference standard) that were not reported in the original radiology report, with generally high specificity. But it was uncertain how much the technologies can improve VFF detection in the NHS. So the committee agreed more evidence is needed comparing the technologies with standard care in the NHS. No studies were identified for TechCare Spine and HealthOST, so the committee concluded that the diagnostic accuracy for these 2 technologies is unknown.

Choice of technologies

3.12

TechCare Spine and BoneView both analyse X-ray images of the spine. The clinical experts explained that they thought it was less likely that a VFF would be missed on those images. This is because they would usually be taken to investigate back pain, so the spine would be thoroughly reviewed. The clinical experts added that a VFF is also less likely to be missed on a side-view chest X-ray image and so they questioned the value of Annalise Enterprise CXR/Annalise Container CXR. They also noted that although frontal-view chest X-ray images are routine practice, side-view chest X-ray images are no longer commonly done in the NHS. So, technologies that detect VFFs on side-view X-ray images may be less useful in the NHS. The company Annalise.ai noted that a side-view chest X-ray image is optional for Annalise Enterprise CXR/Annalise Container CXR, and that the technology can also analyse frontal-view chest X-ray images. But the EAG cautioned that the diagnostic accuracy studies for this technology included mostly side-view chest X-ray images, so the diagnostic accuracy of the technology using frontal-view images alone is uncertain. The committee concluded that further research is needed on the diagnostic accuracy of these technologies, using X-ray images applicable to NHS practice.

Impact on clinical management

3.13

The committee asked about the impact of false positive results from the AI technologies. The clinical experts explained that this may lead to some unnecessary dual-energy X-ray absorptiometry (DEXA) scans. But, they added that the technologies can only be used as a decision aid, and the healthcare professional reviewing the radiograph may identify some of the false positive results. They also noted that a high number of false positives would have an impact on the workforce if additional review by a radiologist was needed.

3.14

The clinical experts also highlighted the lack of evidence on how many people would be referred or treated after opportunistic identification of their VFF. They also noted the variation in access to fracture liaison services in the NHS. The EAG confirmed that very little evidence was identified on how introducing the technologies affected the clinical management of VFFs, particularly in an NHS setting. The committee concluded that evidence on the impact on referral and treatment rates and radiology workload of introducing the technologies should be generated for all of the technologies.

Other outcomes

3.15

The committee noted that there was at least 1 study that provided evidence about the failure rates for 6 of the technologies and that they differed between the technologies. The committee queried the definition of failure of the AI software to interpret a radiograph and asked if there was evidence on the causes of failure from the clinical evidence review. The EAG explained that the failure rate included both a failure of the AI software to process a radiograph or a failure of the AI software to produce a definitive report. It also explained that the causes of failure were not reported in most cases. A specialist committee member added that the failure rate could be related to the image quality of the radiograph. The committee concluded that the failure rates and the reasons for failure represent evidence gaps, and further evidence should be generated for all of the technologies.

Cost effectiveness

Clinical parameters

3.16

The EAG developed an early economic model to explore the potential cost effectiveness of opportunistic VFF detection with assistance from AI technologies compared with current standard care (reporting radiographer without AI assistance). The committee noted that the sensitivity and specificity values used for the standard care arm in the model were from a small expert elicitation study and so were very uncertain. It recalled that there was a lack of studies comparing the use of AI technologies with standard care in the NHS (see sections 3.8 to 3.11). 3.11The committee also queried whether it is known what proportions of people whose VFF is opportunistically detected are already on bone density treatment. This is because identifying a VFF will not provide any added benefit for these people, in terms of future fracture risk reduction. The committee also queried whether the proportion of people who have already had or been referred for a DEXA scan is known. It recalled that there was a lack of evidence on the impact of the technologies on clinical management (see sections 3.13 and 3.14). The EAG said that this may be captured in the model because it assumed that only 15% of people correctly identified as having a VFF would have treatment. But, this was also based on the same expert elicitation study. The committee concluded that there was substantial uncertainty in some of the clinical parameters in the model because of the lack of data. It recommended that this should be addressed with evidence generation.

Quality of life

3.17

A quality-of-life benefit was modelled for people with a VFF correctly identified in the model. But the EAG noted that this was from a study in people with symptomatic vertebral fractures and that the short-term gain from treatment seemed too high. It explained that to better reflect the population in this assessment, it halved the utility value in the base case and explored even smaller utility gains in scenario analyses. These had a large impact on the cost-effectiveness results. The patient experts noted that VFFs can be extremely painful but that getting a diagnosis can be difficult because the symptoms can be mistaken for something else. There may also be pain in other areas, such as the stomach. The clinical experts agreed that people with opportunistically identified VFFs may still be experiencing symptoms. They stressed that medicines can provide relief and can therefore also improve quality of life. The committee agreed that there was substantial uncertainty around the short-term impact on quality of life of identifying VFFs opportunistically. It concluded that future evidence generation should address this evidence gap.

Cost parameters

3.18

The EAG calculated the cost per scan for each technology, which included product subscription, implementation, integration, training and maintenance costs. The company-provided costs were commercial in confidence, so the EAG also modelled a hypothetical scenario using a generic AI technology costing £7.36 per scan. The committee noted that the cost per scan of all technologies was similar to or below the cost of a generic AI technology. No cost was provided for BoneView. The EAG used a notional additional cost of £1 per scan to the cost of scanning provided by the company. But the committee noted that it was not known whether this cost reflected the true cost of the technology. So, it concluded that the potential cost effectiveness of BoneView was even more uncertain. The committee advised that trusts should consider the costs used in this assessment when implementing the technologies.

3.19

The committee noted that the cost for treating and managing a VFF after it has been identified may have been overestimated. This was because it was sourced from a technology appraisal on a treatment for osteoporosis, was based on people with a diagnosed VFF and included hospitalisation costs. The committee queried whether people who have been diagnosed under standard care may be experiencing more severe symptoms than those identified opportunistically. But, it recognised that, for an assumed level of clinical benefit, using a higher value for the cost of treating and managing a VFF would underestimate the value of AI technologies. This therefore served as a conservative estimate.

Model structure

3.20

The EAG's model was a decision tree with a 1-year time horizon. The committee highlighted that the short time horizon of the model was a major limitation. This is because many of the benefits of detecting a VFF earlier would occur beyond this short time horizon. But, it noted that there are also costs that would be incurred in the future. The EAG said that any longer-term modelling would have been subject to substantial uncertainty because of the lack of evidence on the impact of the technologies on clinical management (see sections 3.13 and 3.14). But it expected that including longer-term costs and benefits would likely improve the cost effectiveness of the technologies. This is because additional relevant costs and effects would be included. For example, a reduced future fracture risk for people whose VFFs are identified earlier and have treatment. A reduction in quality of life and additional costs for people whose VFF is not reported would also be included. The committee concluded that longer-term modelling would be needed in the future to reduce this uncertainty when the recommendations are reviewed after evidence generation.

Plausibility of cost effectiveness

3.21

The committee noted that, in the base case, all of the technologies were more expensive than standard care. But, they also led to quality-of-life gains for the people whose VFFs were identified and treated.

3.22

The committee recalled some of the key uncertainties related to parameters in the model. Among them, the committee recognised the uncertainty of the sensitivity and specificity used in the standard care arm (see section 3.16). But it noted that varying those parameters had a small impact on the results in the sensitivity analyses. That is, unless the diagnostic accuracy of standard care approached that of the AI technologies, in which case the AI technologies were unlikely to be cost effective. The committee also acknowledged the significant uncertainty of the utility gain parameter (see section 3.17). It noted that if the utility gain was smaller than the one used in the base case, the technologies were unlikely to be cost effective. But, it recalled that the EAG did not capture any longer-term benefits of the AI technologies, which is likely to have underestimated their value (see section 3.20). The committee noted the uncertainty in some of the parameters and the limitations of the model structure. But it concluded that despite this, it is plausible that the AI technologies could be cost effective if implemented in the NHS.

Risks

3.23

The committee noted the resource impact assessment. It showed that implementing the technologies in the NHS could lead to a significant increase in the number of X-ray images and CT scans that need to be reviewed by a radiologist. This is because many of the radiographic images with a VFF identified may need an additional review by a specialist radiologist, especially if the first review was done by a radiographer who has not had specialist musculoskeletal training. There would also be an increase in the number of follow-up DEXA scans that need to be done. The committee heard that using 1 of the technologies in a large NHS trust had led to a significant increase in workload. The committee noted that although immediate large-scale implementation across the NHS is unlikely, introducing these AI technologies could significantly increase the pressure on radiology services. The committee also noted that most of the provisionally recommended technologies are only indicated for use in people over 50 years. So, the resource impact of using those technologies will be smaller than the impact of the technology which is indicated for use in all adults over 18 years. The committee recalled that age is an important risk factor for VFFs, but that there are also other risk factors independent of age that can result in VFFs in younger people (see section 3.1). But the AI technologies can only be configured to assess specific radiographs based on demographic information such as age and not on other risk factors, so they could not be targeted for those specific groups. The committee recognised that the prevalence of VFFs may overall be lower in people under 50 years and so using the technologies to analyse images in this age group may be less beneficial. The committee also highlighted that, depending on the false positive rate, the resource impact could be much greater if the technologies are used in a wider population. But it noted that the EAG did not identify any evidence to enable any subgroup analyses. So, it was uncertain whether the clinical effectiveness of the technologies would differ in people under 50 or people with another risk factor. It was also uncertain whether it would be cost effective to use the technologies to analyse images in younger age groups or what an appropriate age cut off might be. The committee concluded that evidence should still be generated across all age and risk groups to establish whether this would be a good use of resources. The committee added that the financial and system risks would need to be managed when generating this evidence. It noted that NICE's guideline on assessing the risk of fragility fracture in osteoporosis can help NHS trusts define groups that are at higher risk, if prioritising specific groups is considered appropriate when implementing the technologies.

3.24

The committee queried whether implementation of the AI technologies could lead to healthcare professionals becoming over-reliant on them and whether this could lead to deskilling in the longer term. The clinical experts explained that it is possible that healthcare professionals would learn from the AI's feedback and that there is no imminent risk of deskilling. The committee recalled that the technologies can only be used as a decision aid and so would still always need clinical review and judgement (see section 2.1).

Equality considerations

3.25

The committee recalled that the technologies vary in their indications and that 5 of them are only indicated for people over 50 years. But, the clinical experts emphasised that VFF risk rises significantly with age and that most VFFs are in people over 50 years. The committee also recalled that in the clinical evidence review, the mean or median ages of the study populations were generally between 65 and 80 years. The committee noted that 1 of the recommended technologies is indicated for use in people over 18 years. It highlighted that all of the technologies should be used within their indicated populations, as outlined in each technology's instructions for use. But, it added that evidence generation in younger populations could help guide future recommendations on targeting the technologies. The committee recalled that osteoporotic VFFs do happen in younger people and that there are multiple risk factors. In particular, they are more common in women, trans men and non-binary people after menopause, in whom osteoporosis is more common. But there are other risk factors for osteoporosis (see section 3.1). The committee also recalled that VFFs can also be a result of chronic or long-term corticosteroid or glucocorticoid use or malignancy in the vertebrae.

3.26

The committee highlighted that a common limitation of AI technologies is the lack of transparency about the data used to train the algorithm. It thought that the technologies may perform worse for people who may have been underrepresented in the AI training datasets. This could include younger people, ethnic minorities, people with comorbidities or those who have had previous treatment. The EAG remarked that there are limited details about the characteristics of the patient population in the clinical evidence. The committee noted that future evidence generation should include relevant patient characteristics that would allow for analyses to investigate whether the technologies have been tested or validated in diverse patient populations. The committee reiterated that consideration should be given to the data the algorithms were trained on and whether they may not work as well for all groups. It recommended that companies should be transparent in providing details on this data.

3.27

The committee heard that there are geographical inequalities with regard to access to radiology services. It is currently unknown whether implementing the AI technologies could improve or exacerbate those inequalities. The evidence generation plan specifies that ideally, future research should be done across NHS trusts with and without fracture liaison services and replicated across multiple centres.