How are you taking part in this consultation?

You will not be able to change how you comment later.

You must be signed in to answer questions

  • Question on Consultation

    Has all of the relevant evidence been taken into account?
  • Question on Consultation

    Are the summaries of clinical and cost effectiveness reasonable interpretations of the evidence?
  • Question on Consultation

    Are the recommendations sound and a suitable basis for guidance to the NHS?
  • Question on Consultation

    Are there any equality issues that need special consideration and are not covered in the medical technology consultation document?
The content on this page is not current guidance and is only for the purposes of the consultation process.

3 Approach to research

3.1 Evidence gaps and ongoing studies

Table 1 summarises the evidence gaps and ongoing studies that might address them. Information about evidence status is derived from the external assessment report. More information on the studies in the table can be found in the supporting documents.

Table 1 Evidence gaps and ongoing studies

Evidence gap

BriefCase-Triage

CINA-VCF Quantix

HealthVCF

IB Lab FLAMINGO

Health related quality-of-life impacts of AI technologies

No evidence

No evidence

No evidence

No evidence

Resource use

Limited evidence

Limited evidence

Ongoing study

Limited evidence

Ongoing study

Limited evidence

Ongoing study

Impact of using AI technologies on NHS care pathway

No evidence

No evidence

No evidence

Ongoing study

No evidence

Failure rates and diagnostic accuracy of AI technologies ideally compared with NHS standard care

Limited evidence

Limited evidence

Limited evidence

Ongoing study

Limited evidence

Healthcare professional experience and acceptability of AI technologies

No evidence

No evidence

No evidence

No evidence

Abbreviations: AI, artificial intelligence.

3.2 Data sources

NICE's real-world evidence framework provides detailed guidance on assessing the suitability of a real-world data source to answer a specific research question.

The Fracture Liaison Service Database (FLS-DB) could potentially support this research. This database contains patient-level data on secondary fracture prevention in England and Wales, collected as part of the Falls and Fragility Fracture Audit Programme. It includes much of the data needed to address the evidence gaps, such as individual patient outcome data items, identification of fragility fracture, and the bone therapy recommended.

The Diagnostic Imaging Dataset (DID) is a national collection of detailed information about diagnostic imaging tests done in the NHS. It could be used to address some evidence gaps, specifically around failure rates and diagnostic accuracy of the artificial intelligence (AI) technologies ideally compared with NHS standard care. The data for DID is extracted from local Radiology Information Systems and submitted monthly.

The Ionising Radiation (Medical Exposure) Regulations (IR[ME]R) dataset contains information on X-rays and CT scans done in each NHS trust. This information could be used to address the failure rates and diagnostic accuracy of the AI technologies ideally compared with NHS standard care evidence gaps. This dataset is only available locally at NHS trusts hospitals.

Patient level data from FLS-DB and DID can be linked to other datasets, such as NHS Digital's Hospital Episode Statistics. This could support the evaluation of longer-term outcomes such as adverse events and resource use in the NHS, such as further hospital appointments and referral for treatment.

The quality and coverage of real-world data collections are of key importance when used in generating evidence. Active monitoring and follow up through a central coordinating point is an effective and viable approach for ensuring good-quality data with broad coverage.

3.3 Evidence collection plan

Most of the evidence gaps can be addressed through a real-world, before-and-after implementation study. A retrospective service-evaluation study using available databases is also proposed to evaluate the failure rates and diagnostic accuracy of the AI technologies ideally compared with NHS standard care evidence gap.

Real-world before-and-after implementation study

This type of study can assess an intervention's impact by comparing measurements from before and after its implementation. In this instance the impact of the AI technologies on health-related quality of life, resource use and the care pathway would be assessed in both phases. Once the technologies have been implemented, data about their failure rates and diagnostic accuracy in a real-world setting can also be collected.

After an enrolment period, data collection should be long enough for sufficient follow-up. The AI technologies should then be implemented and data collected to assess their impact, after leaving a period of time to account for learning effects.

While this study could be done at a single centre, it should ideally be implemented across NHS trusts with and without fracture liaison services and replicated across multiple centres. This could show how the AI technology can be implemented across a range of settings, representative of the variety in the NHS. Outcomes may reflect other changes that occur over time in the population, unrelated to the interventions. Additional robustness can be achieved by:

  • collecting data in a centre that has not implemented an AI technology but is as similar as possible (in terms of clinical practice and patient characteristics) to one that has, or

  • ideally through a stepped-wedge design.

This could help control for changes in diagnosis and treatment rates over time that might have occurred anyway.

Developers could initially do a 'silent evaluation' (see Kwong et al. 2022) before full deployment into services. This approach allows the technology to be used in a real-world setting without any influence on clinical decision making until it is fully deployed. This approach can be used to:

  • understand whether the technology can be deployed safely (including in subpopulations)

  • how it might have influenced decision making (for example, onward referrals)

  • collect some relevant data items (for example, failure rate or number of indeterminate findings).

In order to mitigate the committees concerns about the resource impact of implementing the technologies (see section 3.23 of the draft guidance), initial uses should be on a small-scale. Wider rollout may be possible within the period of evidence generation if, and when it becomes clear that the resource impact of the technologies is manageable.

Retrospective study

Current NHS standard care failure rates and diagnostic accuracy could be evaluated using data from IR(ME)R, FLS-DB and DID in a retrospective study. These failure rates and diagnostic accuracy findings should then be compared with those of implemented AI technologies.

For AI technologies indicated for use in all adults over 18 years, data from IR(ME)R, FLS-DB and DID should be used to evaluate failure rates and diagnostic accuracy for people younger than 50 years and at risk of VFF for example people with long-term corticosteroid use or malignancy in the vertebrae.

3.4 Data to be collected

The following information has been identified for collection:

Retrospective study

  • Patient demographics including age, sex and ethnicity

  • Diagnostic accuracy

  • Accuracy when used by different healthcare professionals (radiologists, radiographers and other healthcare professionals)

  • Failure rate or rate of inconclusive AI reports

  • Number of missed fractures

  • Rate of missed fracture-related further injury

  • Proportion of people who need further imaging

  • Conditions that may complicate imaging (for example, obesity or scoliosis)

  • Health-related quality of life data, ideally collected with the EQ-5D-3L questionnaire

  • Details of the technology (software name, version, and configuration settings)

  • Image details (including anatomical location, projection when considering X-rays and manufacturer of CT or X-ray machine)

Real-world before-and-after implementation study

  • Patient demographics including age, sex and ethnicity

  • Conditions that may complicate imaging (for example obesity or scoliosis)

  • Health-related quality of life data, ideally collected with the EQ-5D-3L questionnaire

  • Detail of the technology (software name, version, and configuration settings),

  • Image details (including anatomical location, projection when considering X-rays and manufacturer of CT or X-ray machine)

  • Time taken to process and report image with AI assistance

  • Subsequent scanning, for example spinal X-ray and dual-energy X-ray absorptiometry (DEXA) scans

  • Time to diagnosis

  • Time to further referral or treatment

  • Number of treatments and extent of treatments

  • Number of hospital appointment, including referrals to fracture clinics and orthopaedic assessment

  • Impact of complications such as number of hospital admissions and emergency department visits.

Data collection should follow a predefined protocol, and a quality assurance process should be put in place to ensure the integrity and consistency of data collection. See NICE's real-world evidence framework, which provides guidance on the planning, conduct, and reporting of real-world evidence studies.

Information about the technology

Information about how the technology was developed should also be reported, including:

  • the characteristics of the patient data used in the AI training datasets

  • the version tested and

  • how the effect of future updates will be monitored.

The AI training datasets should include younger people, ethnic minorities, and people with comorbidities or who have had previous treatment. This will ensure that the technologies can be analysed, tested or validated in diverse patient populations. See the NICE evidence standards framework for digital health technologies.

3.5 Evidence generation period

The evidence generation period should be 3 years. This will be enough time to set up and implement the AI technologies, collect the necessary data and analyse it.