Rapid Fire Best of the Best Oral Clinical Oncology Society of Australia Annual Scientific Meeting 2024

Enhancing Information Equity: Sustainable Production of Patient-Comprehensible Trial Information Using Large Language Models with Expert Supervision – Preliminary Findings (#37)

Minh Tran 1 2 , Kate Saw 1 , Jeremy Mo 1 2 3 , Lucy Haggstrom 1 2 3 , Emma-Kate Carson 4 , Kate Macdonald 5 , Rachel Dear 2 3 , Elgene Lim 1 2 3 , Frank Lin 1 2 6 7
  1. The Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
  2. School of Clinical Medicine, UNSW, Sydney
  3. The Kinghorn Cancer Centre, St Vincent’s Hospital, Darlinghurst, NSW, Australia
  4. Macarthur Cancer Therapy Centre, Campbelltown Hospital, Campbelltown, NSW, Australia
  5. Lived Experience Researcher, Sydney
  6. NHMRC Clinical Trials Centre, University of Sydney, Sydney
  7. Department of Oncology, Prince of Wales Hospital, Sydney

Background: Oncology clinical trial information is highly complex and may present a barrier to understanding for patients and families. However, producing accurate but comprehensible patient-facing materials requires many resources. Recent advancements in generative AI and large language models (LLMs) present a unique opportunity to efficiently generate quality patient information through text summarisation capabilities.

Methods: As part of the Eva - Clinical Trials Navigator Project, we conducted a technical development study to optimise how LLMs can translate trial information into lay-accessible language. Using a randomized, blinded design, an expert panel (4 medical oncologists, 1 live-experienced consumer) iteratively assessed the machine-translated text based on accuracy, acceptability, and understandability (5-point Likert scales). Three LLMs (Llama-3-8B, Mistral-7B, and GPT-4) and prompt engineering strategies were assessed in a factorial design. We report the preliminary results in selected Australian breast cancer trials translated using this open-source prompt optimization pipeline.

Results: Raw data from 15 trials retrieved from ClinicalTrials.gov were translated into lay language by the LLMs. The machine translation using the zero-shot prompting methods yielded variable accuracy, with mean scores across four oncologist reviewers of 3.6 (range 1-4), 4.4 (3-5), and 4.6 (3-5) for Llama-3-8B, Mistral-7B, and GPT-4, respectively. Key qualitative insights were obtained from the expert panel feedback: a preference for succinctness of concepts to aid understanding, identification of erroneous or incomplete information, and the appropriateness of language that was too informal or awkward in context. This suggests that LLM translations require iterative refinement to achieve the effective quality required for this task.

Conclusions: Our preliminary results showed significant variability in the quality of machine-generated translations across LLMs, necessitating the combination of AI with expert supervision to successfully develop patient-facing trial navigation tools. These results will inform future research on investigating equity in information understanding among cancer patients from diverse backgrounds.