Evidence Before Impression
An agentic systems architecture for multi-view knee radiograph interpretation — the design behind how 5C builds radiology AI you can audit, trust, and validate.
5C Applied AI Team
5C Network · Technical systems manuscript
Abstract
Knee radiographs are among the most common musculoskeletal imaging studies, yet automated interpretation remains challenging in precisely the cases that matter clinically: subtle fractures, early osteoarthritis, equivocal effusion, projection-limited patellar findings, and normal variants that mimic pathology.
We present an agentic systems architecture that treats reporting as structured evidence compilation rather than direct captioning — assembling multi-view studies, inferring projection roles, running task-bounded agents, reducing outputs to a canonical evidence ledger, triggering specialists when warranted, applying a skeptic pass, and synthesizing a report only from the ledger.
Technical systems manuscript · · 5C Applied AI Team
Now partnering with leading hospitals
We are accepting IRB-approved clinical trials of this architecture with leading hospitals.
Bring an anonymised, radiologist-adjudicated knee cohort. We run unbiased external evaluation, share back full traces and ledgers, and co-author the results.
TL;DR: Knee X-ray AI should produce evidence before impression. Instead of captioning a study in one shot, the system assembles the multi-view study, preserves projection identity, runs task-bounded agents, reduces everything to a canonical four-state evidence ledger, triggers specialists only where warranted, applies a finding-specific skeptic pass, and writes the report only from the ledger. The contribution is not a new foundation model — it is a clinical-computational architecture that makes foundation-model reasoning inspectable, view-aware, and measurable, so it can be validated under IRB.
§ 01 · Motivation
Why "Evidence Before Impression"
Knee radiography is a high-volume examination across emergency medicine, orthopedics, rheumatology, sports medicine, and post-operative care. Many cases are straightforward: a normal knee may be easy to report, and a severely degenerated knee or displaced fracture may be visually obvious. The clinically important difficulty lies in the low-signal middle — early osteoarthritis, subtle avulsion fragments, tibial spine injuries, small patellar fractures, equivocal suprapatellar effusion, projection-limited alignment, and post-operative changes that can be confused with pathology.
Radiologists do not solve these cases through one-shot captioning. They first determine which projections are available and whether each is diagnostic, which findings are best evaluated on which view, whether a suspected finding is corroborated or contradicted across views, and whether weak evidence should be reported, downgraded, or left indeterminate. The final report is a reduction over evidence.
Most current AI systems do not model that workflow. CNN-based systems are effective for focused classification, but a full knee report requires more than a label. Vision-language systems can produce fluent prose, but fluency is not evidence-grounded interpretation. A credible reader must make intermediate state visible: views checked, findings considered, evidence for and against each finding, limitations, specialist decisions, and report rationale. This paper frames knee radiograph interpretation as an agentic evidence-compilation problem built for exactly that.
7
Staged steps from study assembly to ledger-constrained report
9
Task-bounded agents, each returning structured output not prose
4
States per finding: present · absent · indeterminate · not_assessable
From the paper. This is a technical systems manuscript prepared as scientific background for clinical validation — not a completed trial or a claim of radiologist equivalence.
§ 02 · Contributions
Four Contributions
The paper makes four contributions to how knee radiograph AI should be built and evaluated.
Multi-view agentic architecture that preserves projection identity
A knee-specific architecture that keeps each image's projection role — AP, lateral, sunrise/Merchant, oblique — intact throughout interpretation. Projection identity is part of the diagnostic task: effusion and patellar height depend on a true lateral; compartmental narrowing on a frontal view.
A structured evidence ledger for reportable clinical state
A canonical ledger that converts heterogeneous model observations into reportable clinical state — one entry per finding, with state, confidence, supporting and opposing evidence, limitations, supporting views, and grade. It is the system's central, inspectable artifact and a natural supervision target.
A skeptic pass that reduces unsupported positives
A finding-specific mechanism for reducing unsupported positives arising from projectional, soft-tissue, or artifact confounds — grounded in radiographic view logic rather than generic conservatism. It downgrades effusion without capsular support, alignment overcall on oblique views, and lesion claims better explained by artifact.
A validation framework beyond diagnostic labels
A framework that evaluates not only diagnostic labels but also view handling, uncertainty behavior, clinical significance of errors, and report signability — the basis for the radiologist-adjudicated, IRB-approved studies we are now running with partner hospitals.
§ 03 · Architecture
The Reasoning Graph
The system is organized as a staged reasoning graph. Each stage converts model outputs into structured intermediate state, and the final report is generated from the ledger — not directly from unconstrained image prose.
Study Assembly & Ingestion
Assembles the multi-view study from DICOM, rendered radiographs, uploads, or composite screenshots. Decodes, validates, and groups images as a single study object.
Projection Inference
Infers projection role from metadata, filenames, and layout. A single AP/LAT composite screenshot is split into virtual per-view panels, and view identity is preserved downstream.
Task-Bounded Agents
Nine bounded agents — projection/coverage, rotation/obliquity, fracture/cortical integrity, alignment/dislocation, joint-space/degenerative, soft-tissue/effusion, patellar height/tracking, loose body vs fabella, postoperative/arthroplasty — each returning structured fields, not a paragraph.
Evidence Ledger Reduction
All agent outputs reduce into a canonical ledger Λk per finding: state, confidence, positive evidence, negative evidence, limitations, supporting views, grade. The central artifact for review, contradiction checks, and trace.
Triggered Specialist Review
Specialists run adaptively: a fracture specialist on suspicious lucency or lipohemarthrosis, an effusion specialist when lateral support is weak, a degenerative specialist on uncertain narrowing — concentrating compute where evidence is uncertain or important.
Skeptic Pass
Finding-specific checks grounded in view logic. Downgrades effusion without suprapatellar/capsular distention, alignment overcall explained by an oblique AP, and osseous-lesion claims better explained by skin, soft tissue, or projectional artifact.
Ledger-Constrained Synthesis
The report writer receives only the final ledger and writes a concise report. It cannot invent unsupported findings or re-interpret the images — otherwise the final model becomes an uncontrolled second reader.
This architecture reflects how 5C Network's Bionic AI engine is built — agentic, stateful, and inspectable rather than monolithic. Read more about Generalised Medical AI and Hybrid Intelligence.
§ 04 · Uncertainty
Four States, Not Two
For each target finding the system predicts one of four states. The distinction between genuine absence and unavailable evidence is clinically essential — and is exactly what binary present/absent labels throw away.
present
The finding is supported by direct, view-appropriate evidence.
absent
Adequately assessed and not present — distinct from simply unseen.
indeterminate
Suspicious but not corroborated — e.g. a lucency without a cortical break.
not_assessable
The required view is missing or inadequate — e.g. patellar height with no true lateral.
A possible effusion without a true lateral view is not the same as absent effusion. A suspicious lucency without a cortical break may be indeterminate rather than present. A small posterior, well-corticated ossicle may represent a fabella variant rather than a loose body. The four-state model is how the system keeps those distinctions instead of forcing a premature call.
§ 05 · Representation
The Evidence Ledger
The ledger is the system's central artifact. It enables specialist review, contradiction checks, trace inspection, and report generation — and it provides a natural supervision target: an expert can correct the report, the ledger entry, or the evidence attached to it.
For each target finding yk, the ledger stores seven fields. One row per finding, one ledger per study.
One entry, in full
"finding": "joint_effusion", "state": "indeterminate", // present | absent | indeterminate | not_assessable "confidence": 0.41, "evidence_for": [ "view": "AP", "cue": "suprapatellar soft-tissue fullness" ], "evidence_against": [ "view": "LAT", "cue": "no true lateral — capsular distention not assessable" ], "limitations": ["no horizontal-beam lateral"], "supporting_views": ["AP"], "grade": null, "skeptic_action": "downgrade present → indeterminate (no capsular distention)" .
A worked ledger entry. The report writer never sees the pixels — only structured rows like this one. Illustrative example, not output from a specific study.
sk · state
present · absent · indeterminate · not_assessable
qk · confidence
Confidence for the asserted state.
E+k · positive evidence
Per-view evidence supporting the finding.
E−k · negative evidence
Per-view evidence opposing the finding.
Lk · limitations
View-availability and image-quality caveats.
Vk · supporting views
Which projections support the entry.
Gk · grade
Grade or severity when applicable (e.g. OA severity).
The report writer reads only this. Traceability by construction.
§ 06 · Failure Analysis
Failure Modes the Architecture Addresses
Each recurrent failure mode in knee radiograph interpretation maps to a specific architectural mechanism. This is the table at the heart of the paper.
| Failure mode | Clinical mechanism | Architectural response |
|---|---|---|
| Effusion overcall | Skin folds, superficial overlap, or nonspecific anterior fullness can mimic joint fluid. | Require direct lateral-view suprapatellar/capsular support; skeptic downgrade when absent. |
| Projectional alignment overcall | An oblique or rotated AP view can mimic varus/valgus deformity. | Track projection limitation; downgrade unsupported alignment claims. |
| Missed early OA | Mild spurs, tibial spine spiking, and subtle narrowing are low signal. | Degenerative specialist and ledger preservation of component evidence. |
| OA overgrading | Osteophytes alone can inflate severity. | Require multiple degenerative components for higher severity. |
| Subtle fracture miss | Small avulsion, patellar, or tibial spine findings can be normalized. | Trigger fracture specialist from direct or secondary trauma cues. |
| Loose body mimic | Fabella and well-corticated variants can mimic intra-articular bodies. | Separate variant/mimic target and require intra-articular support. |
| Hardware confusion | Post-operative changes can be normalized or misread as pathology. | Dedicated postoperative / hardware assessment. |
Table 1 — Representative failure modes and corresponding system mechanisms.
§ 07 · Regularization
The Skeptic Pass
The skeptic pass regularizes weak positives. It is not a generic conservatism knob — it is a set of finding-specific checks grounded in radiographic view logic.
Effusion
Downgrade when the evidence lacks direct suprapatellar pouch or capsular distention — most often a soft-tissue or superficial-overlap confound on a view that cannot support the call.
Alignment
Downgrade alignment abnormality when an oblique AP projection plausibly explains the apparent asymmetry, rather than a true varus/valgus deformity.
Osseous lesion
Downgrade osseous-lesion claims when the evidence is more consistent with skin, soft-tissue, or projectional artifact than with a true bone lesion.
Because these checks live in the reduction layer rather than as new full model stages, they control overcall without materially increasing read time — and the same ledger discipline preserves component evidence so that genuinely subtle findings are not lost.
§ 08 · Evaluation
How It Should Be Validated
Knee radiograph AI should be evaluated not only by final-report accuracy, but by view handling, evidence support, uncertainty behavior, false-positive control, clinically significant miss rate, and report signability. This is the framework we bring to IRB-approved studies.
Reference standard
Radiologist-adjudicated labels rather than report text alone — independent review by two radiologists with third-reader adjudication for disagreements. Labels capture finding presence, view adequacy, severity, clinical significance of errors, and report signability.
Cohort design
Consecutive knee radiographs to estimate real-world operating characteristics, plus enriched strata for low-prevalence findings: subtle fracture, tibial plateau injury, patellar fracture, lipohemarthrosis, arthroplasty, loose body/fabella mimics, and poor or rotated views.
Endpoints
Primary: clinically significant miss rate and false-positive rate, fracture and effusion sensitivity/specificity, OA detection and severity calibration, and report signability. Secondary: projection-role accuracy, true-lateral identification, not-assessable appropriateness, latency, and trace completeness.
Reader-facing evaluation
A silent-mode prospective study measures operational feasibility without influencing care. A subsequent reader-assist study compares radiologist-alone and radiologist-with-system workflows — interpretation time, accepted/rejected suggestions, error correction, confidence, and automation bias.
Ablation ladder
The architecture is evaluated through five ablations that test whether each layer improves the tradeoff between sensitivity, specificity, uncertainty handling, and report signability.
One-shot image-to-report generation
Core agents without specialists
Core agents plus triggered specialists
Core agents plus specialists plus skeptic pass
Full ledger-constrained report synthesis
§ 09 · Governance
Clinical Safety & Governance
The system is intended for validation under appropriate institutional approval before any clinical use.
Does not replace the signed report
During evaluation the system does not replace the signed radiology report, alter patient care, or expose patient-facing outputs. All outputs are treated as research artifacts.
PHI handling follows site procedures
Protected health information handling, storage, de-identification, and access control follow site-approved data-use procedures throughout.
Clinical and research artifacts stay separate
Structured audit artifacts can be retained in a controlled research environment so radiologists and investigators inspect evidence support and failure modes — without changing the clinical report workflow.
§ 10 · Limitations
Limitations
The paper is explicit about what it does not claim.
A systems architecture, not a completed trial
This describes a systems architecture and validation framework, not a completed multicenter clinical trial. No claim is made that the system is ready for autonomous diagnosis.
Performance must be established on hospital data
Performance must be established on de-identified hospital data with radiologist adjudication and prospective silent-mode evaluation — exactly the studies we are now running with partners.
Decomposition has its own risks
Decomposition can propagate early errors, deterministic skeptic rules can become too conservative if poorly calibrated, and specialist routing can miss cases if triggers are too narrow. The ledger itself must be accurate.
Known regimes where it may fail
It may fail on rare pathology, unusual hardware, severe image-quality limitations, pediatric variants, or institution-specific acquisition patterns. No architecture removes the information limits of projection radiography.
Trial this architecture with us under IRB
5C Network is partnering with leading hospitals and radiology AI researchers to validate the agentic knee X-ray architecture on independently curated, radiologist-adjudicated cohorts. Bring your anonymised knee X-ray dataset and we will run it as a fully unbiased external evaluation — with IRB-approved protocols, co-authored publications, silent-mode prospective studies, and reader-assist studies as the collaboration formats.
Total unbiasedness
You bring an anonymised dataset we have never seen. We evaluate on it. Results, traces, and ledgers are shared back in full.
IRB-grade rigor
Radiologist-adjudicated reference standards, IRB-approved protocols, silent-mode and reader-assist designs, and independent benchmarks — not marketing decks.
Data stays yours
Fully anonymised under DTA. No PHI. No re-use beyond the agreed study. The system never replaces your signed report during evaluation.
Thanks. We will be in touch.
The 5C Network research team will respond within two business days. If your data is anonymisation-ready, we will share a DTA template in the first reply.
References
- [1] Varma, M., Lu, M., Gardner, R., et al. (2019). Automated abnormality detection in lower extremity radiographs using deep learning. Nature Machine Intelligence.
- [2] Antony, J., McGuinness, K., O'Connor, N. E., & Moran, K. (2016). Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. International Conference on Pattern Recognition Workshops.
- [3] Mongan, J., Moy, L., Charles, E., et al. (2024). Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 update. Radiology: Artificial Intelligence.
- [4] Vasey, B., Nagendran, M., Campbell, B., et al. (2022). Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nature Medicine.
- [5] Liu, X., Rivera, S. C., Moher, D., Calvert, M. J., & Denniston, A. K. (2020). Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: CONSORT-AI. Nature Medicine.
- [6] Rivera, S. C., Liu, X., Chan, A. W., Denniston, A. K., & Calvert, M. J. (2020). Guidelines for clinical trial protocols for interventions involving artificial intelligence: SPIRIT-AI. Nature Medicine.
Read the Complete Paper
"Evidence Before Impression: An Agentic Systems Architecture for Multi-View Knee Radiograph Interpretation"
By the 5C Applied AI Team · No email required. No paywall.
Continue Reading
5C Research
All publications from the 5C Network research team.
View all papersWhat is GM AI?
Generalised Medical AI — beyond narrow detection to full-workflow radiology AI.
Read moreHybrid Intelligence
How AI and radiologists work together — not AI alone, not humans alone.
Read moreRelated Research Papers
From Slices to Reports
A survey of AI in cross-sectional medical imaging
Deep Learning for Shoulder Fracture Detection
Ensemble system for fracture identification in radiographs
Autonomous AI for Multi-Pathology Detection
Chest X-ray pathology detection across Indian hospitals
Explainable AI in Radiology
Interpretable methods for radiologist trust and adoption