top of page

MENU

What a Chest X-Ray Study Teaches Us About Harnessing AI Value in Health and Care - A Clinical AI Best Practice

  • Writer: Mehdi Khaled
    Mehdi Khaled
  • Jan 5
  • 7 min read
Credits: Rayscape.ai
Credits: Rayscape.ai

Look, we've all sat through the AI pitches. The vendor demos where everything works perfectly. The conference keynotes promising revolution. And then… nothing changes. Your workflow is still underwater, your waiting lists are still growing, you’re borderline burned-out and that shiny AI tool is collecting digital dust on a server rack somewhere.


But here's the thing: sometimes it actually works. And when it does, there are patterns worth learning from.


The NHS Study


A team at an NHS trust in South West London did something rare—they published real numbers from real clinical practice. Not a controlled trial. Not a pilot with cherry-picked cases, but two years of actual deployment across five hospitals serving 500,000 real people.


A crucial parameter in cancer screening is the timing to access critical diagnostics (and obviously reporting) which will define the path to follow. Here, the problem was simple: lung cancer patients need a CT scan within 72 hours of a suspicious chest X-ray. That's the recommended national guideline in the UK. Reality? Average wait was 6 days. Only 3.7% got same-day scans. The queue was killing people, slowly and systematically.


The South West London colleagues deployed an AI system (Annalise Enterprise, if you care) to triage chest X-rays. Here's what happened 56,257 X-rays later:


- Average time to CT: 6 days → 3.6 days (40% reduction)

- Same-day scans: 3.7% → 22.1% (6x increase)

- Meeting the 3-day target: 19.2% → 46.5% (more than doubled)

- Patients who needed CT and got it: 93% → 95.7% (better triage accuracy)


- Cancer cases missed: <0.3% (safety maintained)


And here's the kicker: the AI flagged 11% of X-rays as potentially concerning, but only 2% actually needed cancer workup. That 11% should have flooded the system, right? It didn't. Because humans filtered it. The AI had high sensitivity, radiologists provided specificity, and the workflow didn't collapse.


Why This Matters Beyond Lung Cancer


Strip away the specific disease, and you see a blueprint that works across health and care. Let’s lift the engine hood.


The Reusable Principles


1. AI as Triage, Not Oracle


The AI application didn't diagnose lung cancer. It said "hey, look at this one first." That's it. No autonomous decision-making. No replacing radiologists. Just logical prioritisation of a queue.


This model works anywhere you have:


- Time-sensitive protocols (stroke, sepsis, cardio-vascular events)

- Overloaded specialists (dermatology referrals, retinopathy screening)

- Clear downstream actions (if AI flags it, what happens next?)


Practical use cases:


Apply this to breast cancer screening: AI doesn't read mammograms alone—it flags which ones need priority double-reading, which patients need faster recall appointments, which cases might benefit from 3D imaging. Studies show this can improve cancer detection by 13-20% while reducing radiologist reading time by 30-40%. Same principle: triage, not replacement.


Apply this to emergency medicine: Sepsis risk scores from vitals and labs don't diagnose sepsis—they surface deteriorating patients faster. Time to antibiotic administration drops. Mortality drops. The AI doesn't practice medicine; it reorganises the queue .


2. System Redesign Beats Technology Every Time


Here's what the NHS team didn't do: install AI and wait for magic.


Here's what they did do:


- Created urgent review slots for AI-flagged cases

- Set up walk-in CT booking capacity (Modality capacity)

- Ensured radiologists were available for rapid reads (Human capacity)

- Co-located CT scanners where possible (this mattered more than expected)


The AI was maybe 30% of the solution. The other 70% was unglamorous workflow engineering and carving out capacity to match the technology’s output speed.


This is the part everyone skips in the vendor demos. You can't bolt AI onto a broken system and expect improvement. The NHS trust that saw the biggest gains? The one with CT scanners in the same building as the X-ray units. Geography beat algorithms.


For breast screening, this means: Before deploying AI, you need fast-track ultrasound slots, same-week diagnostic appointments, coordinated booking between screening and diagnostic units, and radiologists with dedicated time for AI-flagged reviews. Without these, the AI just creates a faster route to the same bottleneck.


For any specialty, this means: Map your entire care pathway first. Find the choke points. Remove them. Then add AI to the places where speed matters. Technology accelerates movement through a system—it doesn't fix the system.


3. High Sensitivity + Human Filter = Safety Without Overload


The AI cast a wide net (11% flagged) but didn't flood the system because humans narrowed it (2% actual referrals). This isn't a bug; it's output optimisation by design.


If you optimise AI for low false positives, you'll miss cases. If you optimise for high sensitivity, you'll flag everything. The solution isn't better AI—it's better human-AI collaboration.

The practical application: Design your AI to be “semi-paranoid" (high sensitivity), then build clinical review into the workflow as a refining step. Don't ask AI to be perfect. Ask it to be consistently cautious, and let your clinicians be consistently rigourous.



In breast screening, this looks like: AI flags 8-12% of mammograms as suspicious. Radiologist review brings this down to the typical 4-7% recall rate. Result: better cancer detection without recall anxiety epidemic. Multiple European trials have proven this practice to be safe.


4. Safety Through Retrospective Audits


Smart move from the NHS team: they tracked patients whose X-rays were read as normal but who happened to get CTs within 28 days anyway (for other reasons). Found cancer in <0.3% of those cases.


This is how you catch systematic AI failures before they become disasters. You need:

- A way to identify "near misses" (normal AI read + later diagnosis)

- Regular review cycles (monthly or quarterly)

- Clear escalation if miss rates exceed thresholds

- Updates to AI model or workflow based on failures


For breast screening: Track interval cancers (diagnosed between screening rounds). If AI-assisted screening doesn't reduce these by 20-30%, something's wrong. Build the audit loop on day one, not after a scandal.


5. Efficiency Gains First, Outcome Gains Later


The NHS study measured time to CT and guideline compliance. It did not measure survival, treatment success, or quality of life. That data is coming, but it takes years.


This is honest. Too many AI projects claim outcome improvements they haven't measured. Start with process metrics (time, throughput, compliance) because those are fast and actionable. Plan outcome studies from the beginning, but don't wait 5 years to deploy because you're chasing the perfect endpoint.


6. The Failure Rate Learning Curve


Technical failures dropped from 6.3% → <0.5% over the deployment period. Translation: AI systems aren't plug-and-play. They need babysitting.


7. Infrastructure Beats Algorithms


Cannot stress this enough: the hospitals with on-site CT scan modalities saw the biggest improvements. The AI was identical across sites. The physical logistics determined success.


Before you deploy AI anywhere, ask: "If this tool surfaces a problem, can we act on it quickly?" If the answer is "well, we'd need to refer them to another facility" or "we'd need to schedule them three weeks out," the AI will just highlight your limitations. Fix the infrastructure first, or target AI deployment where infrastructure is already strong.


Expected improvements based on published studies + NHS model

Metric

Baseline

With AI

Impact

Time to recall

14-21 days

5-7 days

60% reduction

Same-week diagnostic

10%

35-45%

3-4x increase

Cancer detection

7-9 per 1000

+0.8-1.5 per 1000

13-20% improvement

Radiologist capacity

100%

60-70%

30-40% augmented

Interval cancer rate

Baseline

-20-30%

Major safety gain

The catch: These numbers only happen if you do the system redesign: i.e. workflow reengineering, cancer registry integration, longitudinal tracking systems, patient outcome databases, etc.

However, if you just add AI to your current workflow, you'll get a digital extension of your current inefficiencies plus more frustrated radiologists.

The Principles That Apply Everywhere


Whether you're looking at radiology, pathology, emergency medicine, primary care screening, or any other domain:


Ask these questions before deployment:


1. Is there a time-sensitive protocol? (If not, AI urgency doesn't matter)

2. What's the downstream action? (If AI flags something, what happens next?)

3. Where are the bottlenecks? (AI can't fix what's broken downstream)

4. Can we act quickly? (Infrastructure determines impact, not algorithms)

5. How will we measure safety? (Retrospective audits, not just accuracy metrics)

6. What's the human role? (AI shouldn't eliminate oversight, it should reorganise it)

7. What's the workflow change? (Technology is 30%, process redesign is 70% - or thereabout...)


Red flags that predict failure:


- "The AI will make everything faster" (without workflow redesign)

- "We'll deploy across all sites simultaneously" (without piloting)

- "Accuracy is 95% so we can trust it" (without safety monitoring)

- "It'll reduce workload" (it redistributes workload, rarely reduces it)

- "Vendor says it's ready to go" (vendor hasn't seen your workflow)


The Study Limitations


The NHS study was admirably honest about what it didn't show:


- No outcome data yet. Faster CT is good, but does it improve survival?

- No downstream analysis. Did this create bottlenecks in pathology or oncology?

- One setting only. Will this work in rural hospitals? Smaller centers? Different countries?

- Staff perspective missing. How did radiologists actually feel about this? (They published that separately—worth reading.)


As we increasingly demand transparency from algorithms, this exemplary level of honesty is more valuable than the positive results. It shows where the evidence gaps are and where your own evaluation needs to focus.


Bottom Line: What Makes AI Work In clinical Settings


It's not the algorithm. It's not the accuracy numbers in the vendor deck. It's not the willingness to embrace AI for its own sake.


It's this:


1. Clear, time-sensitive protocols (72-hour CT, same-day recall, rapid sepsis treatment)

2. Infrastructure that supports rapid action (co-located equipment, available specialists, booking capacity)

3. Workflow redesign, not workflow addition (reorganise queues, don't just add AI alerts)

4. Human-AI collaboration (high sensitivity AI + clinical specificity)

5. Safety monitoring built in from day one (retrospective audits, miss-rate tracking)

6. Honest measurement (process metrics first, outcome metrics when possible)

7. Patience with technical issues (6-12 month learning curve is normal)


The NHS team reduced lung cancer CT wait times by 40% and increased same-day scans by 6x. Not because they had magical AI. Because they combined decent AI with excellent system thinking.


That's the blueprint. It's less sexy than the demos. It's harder than buying a subscription. But it's what actually moves the needle for patients — and their relatives.


In health and care, that's the golden metric that matters most.


_____________


The NHS study was published on 18 December 2025 in NEJM AI. The principles apply far beyond chest X-rays—radiology, pathology, screening programs, emergency triage, anywhere you have time-sensitive protocols and overloaded specialists. The question isn't whether AI works. It's whether you're willing to do the hard work of making it work. Safely and efficiently.


References:

Comments


bottom of page