Evidence generation plan for digital front door technologies to gather service user information for NHS Talking Therapies for anxiety and depression assessments

3 Approach to evidence generation

3.1 Evidence gaps and ongoing studies

The external assessment group identified 4 ongoing or unpublished studies, 3 for Limbic Access and 1 for Wysa Digital Referral Assistant (DRA), that may address some of the evidence gaps.

Evaluate treatment outcomes for artificial intelligence (AI)-enabled information collection tool for clinical assessments in mental healthcare (NCT05495126 study)

The trial aimed to collect data on treatment outcomes, clinical assessment reliability, waiting and assessment times, and assessment and referral dropout rates. The study compared the AI‑supported information collection version of Limbic Access (Class 2a) with the non AI‑enabled Limbic Access version (Class 1). The study ended in December 2024, but there are no publicly available results yet.

Evaluation of a conversational information collection tool to access talk therapy (NCT05678764 study)

This trial aims to collect data on waiting times from referral to assessment, recovery rate, reliable recovery rate and drop out after referral. The estimated study end date is December 2025.

Evaluation of a conversational information collection tool to access talk therapy (Surrey study)

This study aims to evaluate Limbic Access in terms of:

  • clinical effectiveness (including changes in treatment outcomes, diagnosis or waiting times for the people using the service)

  • service efficiencies (including changes in assessment times and staff wellbeing).

The study will compare the Class 2 version of Limbic with the Class 1 version (without AI support). There is no published study end date available.

The benefits of using digital technology (the Wysa app and AI chatbot) to support assessments, waits for therapy and treatment within NHS Talking Therapies services for patients, clinicians, services and the wider healthcare system (ISCRTN10327977 study)

This study aims to investigate the clinical effectiveness and impact of Wysa DRA. This is to evaluate user experience and to establish whether the adoption of Wysa DRA therapeutics results in any service-related efficiencies (for example, clinical or administrative time-savings). Data on health-related quality of life, dropout rates and time taken to complete clinical assessment will be collected. The study will compare Wysa DRA with other referral methods. The anticipated study end date is July 2025.

Table 1 Evidence gaps and ongoing studies
Evidence gap Limbic Access Wysa Digital Referral Assistant

Quality of information and immediate impact on clinical assessment

Limited evidence

Ongoing study

Limited evidence

Ongoing study

Impact of technology on treatment and service pathways

Limited evidence

Ongoing study

Limited evidence

Ongoing study

Resource and service impact

Limited evidence

Ongoing study

Limited evidence

Ongoing study

User engagement and experience

Limited evidence

Limited evidence

Ongoing study

Table 1 summarises the evidence gaps and ongoing studies that might address them. Information about evidence status is derived from the external assessment group's report. Evidence not meeting the current scope and inclusion criteria is not included. The table shows the evidence available to the committee when the guidance was published.

3.2 Data sources

The NHS Talking Therapies: for anxiety and depression and Mental Health Services Data Set (MHSDS) are real-world datasets that could also be used to collect information about the impact that conditions have on mental health. Most of the data needed to address the evidence gaps is already collected within the Talking Therapies services, for example the:

  • number of referrals each day

  • number of people on waiting lists

  • treatment pathways

  • proportion of self-referrals.

New studies will be needed to collect data on measures that are more specific to using the technologies, such as the:

  • time taken for clinical assessments

  • impact on clinical assessments

  • administrative burden

  • user preferences.

NICE's real-world evidence framework provides detailed guidance on assessing the suitability of a real-world data source to answer a specific research question. The quality and coverage of real-world data collections are of key importance when used in generating evidence. Active monitoring and follow up through a central coordinating point is an effective and viable approach of ensuring good-quality data with broad coverage.

3.3 Evidence collection plan

A suggested approach to addressing the evidence gaps is a mixed-methods longitudinal parallel cohort study. This approach would follow an intervention arm and a control arm, and compare their outcomes. This design would allow assessment of the clinical impact of the technologies and the resource use associated with their implementation. Qualitative data could be generated through appropriate methods such as surveys, focus groups or interviews, as highlighted in NICE's real-world evidence framework. This could include reported outcomes (acceptability, usability and preferences) from people using the service.

If technology-derived problem descriptors are being used to support clinical diagnoses, evidence around their accuracy should be generated for future assessment. Ideally, a cross-sectional diagnostic accuracy study would compare agreement between clinical assessor alone and clinical assessor aided by the technology-derived problem descriptors. It would use an internationally recognised diagnostic interview such as the Mini-International Neuropsychiatric Interview as the 'gold standard'. It would be possible to report accuracy (including sensitivity, specificity, and negative predictive values and positive predictive values).

Relevant data may already exist within published studies, which are outside of the current scope but may be useful for future assessments.

The studies should enrol a representative population, that is, people who would be offered a pre-assessment, including people who have self-referred and people referred through any other method. The pre-assessment may include web- or paper-based forms, or telephone pre-assessments. The studies should compare people using digital front door technologies for pre-assessments with a similar group having standard care. Eligibility for inclusion and the point of starting follow up should be clearly defined and consistent across comparison groups to avoid selection bias.

Data should be collected in all groups from the point at which a person would become eligible for standard care (referral). The data from both the intervention and comparison groups should be collected at appropriate time intervals. Data from a comparable population, but with no access to digital technologies for self-management, should form the comparison group. Ideally, the studies should be run across multiple centres, with the aim of recruiting centres that represent the variety of referral pathways in the NHS.

Despite consistent eligibility criteria, non-random assignment to interventions can lead to confounding bias, complicating interpretation of the treatment effect. So, approaches should be used that balance confounding factors across comparison groups, for example, using propensity score methods. To achieve this robustly, data collection will need to include prognostic factors related both to the intervention delivered and patient outcomes. These should be defined with input from clinical specialists. Incomplete records and demographically imbalanced groups can lead to bias if unaccounted for.

Data collection should follow a predefined protocol. Quality assurance processes should be put in place to ensure the integrity and consistency of data collection. See NICE's real-world evidence framework, which provides guidance on planning, doing and reporting real-world evidence studies. This document also provides best practice principles for robust design of real-world evidence when assessing comparative treatment effects using a prospective cohort study design.

Alternative methodological approaches may be applicable, for example, a stepped wedge clinical trial could be a pragmatic and comprehensive approach.

3.4 Data to be collected

Study criteria

At recruitment, eligibility criteria for the suitability of using the digital technologies and inclusion in the real-world study should be reported, and should include detailed descriptions of the:

  • referral pathway

  • technologies and details such as their training needs, digital-safety assurance and the specific versions.

Baseline information and patient characteristics

These should include:

  • information about individual characteristics at baseline, for example, sex, age, ethnicity, first language, medicines and comorbidities, with other important covariates chosen with input from clinical specialists

  • measures recorded at baseline and follow up, of:

    • problem descriptors (the International Statistical Classification of Diseases and Related Health Problems 10th Revision [ICD 10])

    • depression (Patient Health Questionnaire‑9 [PHQ‑9] score)

    • anxiety (Anxiety Disorder Specific Measure [ADSM])

    • the extent to which mental health problems interfere with daily life (Work and Social Adjustment Scale [WSAS] score) should be recorded at baseline and at follow up.

Quality of assessments

These should include:

  • intervention completion rates

  • number of further assessment appointments and attendance rates

  • attendance rates for treatment appointments

  • proportion of people who are offered treatment and complete a treatment

  • changes in treatment (prescribed medicines or intensity of psychological treatment) and service use.

If additional features of the technology are being used, the diagnostic accuracy of the technology should be assessed when using technology-derived problem descriptors. A diagnostic accuracy study should compare: clinical assessor alone, clinical assessor aided by the technology-derived problem descriptors and 'gold standard' (including sensitivity, specificity, and negative predictive values and positive predictive values).

Resource and system use

These should include:

  • time taken for the clinical assessment (including time to review the digital front door information and staff banding)

  • time taken for administrative tasks

  • number of people on the waiting list

  • time to treatment

  • proportion of self-referrals and service-referrals

  • costs of digital technologies, including:

    • licence fees

    • digital safety assurance

    • use and implementation of the technologies

    • healthcare professional staff and training costs

    • promotion

    • integration with NHS systems.

Reported outcomes and experience from people using the service

These should include:

  • acceptability, user preferences and usability

  • access and uptake, including the number and proportion of people who were able to access the technologies (either through self-referral or referral through another service)

  • pre-assessment completion rates or intervention dropout rates

  • clinical assessment attendance rates

  • reasons for not using the technologies (for example, accessibility issues, language barriers or privacy issues)

  • perception of quality by the person carrying out the clinical assessment.

Data collection should follow a predefined protocol, and quality assurance processes should be put in place to ensure the integrity and consistency of data collection. See NICE's real-world evidence framework, which provides guidance on the planning, carrying out and reporting of real-world evidence studies.

3.5 Evidence generation period

This will be 3 years to allow for setting up, implementation, data collection, analysis and reporting.

This page was last updated: