(radiology[tiab] OR "radiology report*"[tiab]) AND ("large language model*"[tiab] OR LLM[tiab] OR GPT[tiab]) AND (report*[tiab] OR impression[tiab] OR conclusion[tiab]) AND (style[tiab] OR preference*[tiab] OR personalized[tiab] OR personalization[tiab] OR imitation[tiab] OR adaptation[tiab]) AND (radiologist*[tiab] OR reader*[tiab] OR physician*[tiab] OR "individual radiologist"[tiab] OR "expert feedback"[tiab] OR "human feedback"[tiab] OR fine-tun*[tiab] OR finetun*[tiab] OR LoRA[tiab] OR PEFT[tiab])
Purpose To evaluate large language model (LLM)-based strategy performance for extraction and classification of incidental findings from whole-body (WB) imaging reports, particularly strategies incorporating Oncologically Relevant Findings Reporting and Data System (ONCO-RADS). Materials and Methods In this retrospective bicenter study, authors included all WB MRI reports from January 2016 to December 2023 at a referral center (internal dataset). Two observers extracted all incidental findings, and patient records were used to confirm final diagnoses. First, authors evaluated ONCO-RADS performance and the reproducibility of its incidental finding classifications by six radiologists. Then, authors evaluated the accuracy of three LLM-based strategies: (a) a fine-tuned DeBERTa/medical named entity recognition (NER) model; (b) zero-shot LLMs (ChatGPT-o1 [OpenAI], Gemini-2.5-Pro [Google]); and (c) reference-guided prompting of these LLMs using ONCO-RADS. Authors then expanded these strategies to an external dataset of 605 reports with multiple imaging techniques (405 WB MRI; 100 fluorodeoxyglucose PET/CT; and 100 chest-abdomen-pelvis CT acquisitions) from January 2022 to January 2025. Results The internal dataset included 823 patients (mean age, 63.7 years ยฑ 11.7 [SD]; 457 male patients) with 1488 WB MRI reports. The average interobserver reproducibility of ONCO-RADS incidental finding classifications was excellent (Cohen ฮบ, 0.87). The per-report accuracies of ONCO-RADS-guided LLMs (95.6% [151 of 158] and 86.7% [137 of 158] for ChatGPT-o1 and Gemini-2.5-Pro, respectively) were higher than those of the medical NER (69.0% [109 of 158]) and zero-shot LLMs (57.0% [90 of 158] and 70.9% [112 of 158] for ChatGPT-o1 and Gemini-2.5-Pro, respectively) (P < .001). In the external test set (mean age, 60.6 years ยฑ 12.9; 330 male patients), the per-report accuracies of ONCO-RADS-guided ChatGPT-o1 (83.5% [505 of 605]) and Gemini-2.5-Pro (82.0% [496 of 605]) were higher than those of the models without ONCO-RADS prompting (63.1% [382 of 605] and 61.2% [370 of 605], respectively) and the medical NER (55.7% [337 of 605]) (P < .001). Conclusion Reference-guided prompting of the LLMs ChatGPT-o1 and Gemini-2.5-Pro with ONCO-RADS improved their performance in extracting and classifying incidental findings on WB imaging reports compared with zero-shot prompting and medical NER. Keywords: Large Language Models, Incidental Findings, Whole-Body MRI Supplemental material is available for this article. ยฉ RSNA, 2026.
Ambiguous or incomplete documentation is a recurrent bottleneck in radiation oncology workflows, leading to inefficiencies in communication and potential treatment delays. Large language models (LLMs) pose a solution to addressing these ambiguities without added burden to clinical staff. We aim to assess the effectiveness of Meta's open-source Llama 3.3 model in using physician consultation notes to isolate and classify anatomical treatment sites and create helpful extractive summaries for each patient.
Semi-structured interviews with five radiation therapists revealed that CT simulation orders lack the necessary details to acquire the appropriate image. A retrospective cohort of 100 patient notes was used for iterative prompt engineering. The final model was evaluated on an independent test cohort of 52 patient notes. The LLM's accuracy in identifying the treatment site was benchmarked against two human observers (a medical physicist and a physician) as well as the final delivered treatment plan (ground truth). The helpfulness and accuracy of the AI-generated summaries were also rated by both observers on a 5-point Likert scale.
Llama 3.3 achieved a weighted accuracy of 94.2% [95%CI: 89.4%-98.1%] when compared to sites isolated by either observer. When compared to the sites isolated from the retrospectively delivered plans, the model reached a weighted accuracy of 92.3% [95% CI: 87.5%-97.1%]. The model classified the anatomical sites with a weighted accuracy of 96.2% [95%CI: 87.0% -98.9%]. The AI-generated summaries were highly rated by both observers (Observer 1: 4.96 [95%CI: 4.87-5.00] and Observer 2: 4.58 [95% CI: 4.38-4.73]).
This pilot study provides foundational evidence that LLMs can classify data with high accuracy, achieve benchmarks comparable to human experts when isolating anatomical treatment sites, and produce clinically helpful summaries. Our results suggest that LLMs can be effectively integrated to streamline complex radiotherapy workflows in the clinic.
A retrospective, blinded evaluation of 200 oncologic computed tomography reports compared original radiologist-authored impressions, impressions generated by a custom domain-specific AI model fine-tuned on institutional data, and impressions generated by a general-purpose large language model. Ten clinicians, including original radiologists (nโ=โ4), independent radiologists (nโ=โ3), and oncologists (nโ=โ3), rated impressions for completeness, correctness, conciseness, clarity, clinical utility, and patient harm. Original and independent radiologists assigned lower preference to generic model impressions (Cohen's h 1.04-1.22 and 0.66-0.69, pโ<โ0.001). Original radiologists slightly preferred their own impressions to the custom model (hโ=โ0.18, pโ=โ0.0716), while independent radiologists showed no preference (hโ=โ-0.03, pโ=โ0.78). Oncologists demonstrated no significant preference among impression types (hโ=โ0.04-0.12, all pโ>โ0.20). Custom model impressions achieved near parity with human impressions; original radiologists rated their own impressions slightly more complete (rโ=โ0.22, pโ=โ0.0016). Generic model impressions were longer (75.1โยฑโ20.4 words), slightly more complete (rโ=โ0.18-0.39, pโ<โ0.001-0.01), but significantly less concise (rโ=โ0.85-0.87, pโ<โ0.001). Patient harm ratings were uniformly low (likelihood 1.01-1.14; extent 1.05-1.21). Inter-rater reliability ranged from -0.09 to 0.67 (ฮฑโ=โ0.67 conciseness; ฮฑโ=โ-0.09-0.03 clinical utility/correctness).
Automated structuring of radiology reports is essential for data utilization and the development of medical artificial intelligence models. However, manual annotation by experts is labor-intensive, and processing real clinical data through commercial large language models (LLMs) presents significant privacy risks. These challenges are particularly pronounced for non-English languages like Japanese, where specialized medical corpora are scarce. While synthetic data generation offers a potential privacy-preserving alternative, its effectiveness in capturing complex clinical nuances-such as negation and contextual dependencies-to train robust classification models without any real-world training data has not been fully established.
This study aimed to develop a context-aware sentence classification model for Japanese radiology reports using an entirely synthetic training pipeline, thereby eliminating reliance on real-world clinical data during the development phase. Furthermore, we sought to evaluate the generalizability of this approach by validating the model's performance on diverse, multi-institutional, real-world reports.
Japanese radiology reports (n=3104) were generated using GPT-4.1 and automatically annotated at the sentence level into 4 categories (background, positive finding, negative finding, and continuation) using GPT-4.1-mini. The synthetic data were partitioned into training (n=2670), validation (n=334), and test (n=100) sets. We fine-tuned several models, including lightweight local LLMs (Qwen3 and Llama 3.2 series) using low-rank adaptation and Japanese text classification models (Bidirectional Encoder Representations from Transformers [BERT]-base Japanese v3, Japanese Medical Robustly Optimized BERT Pretraining Approach [JMedRoBERTa]-base, and ModernBERT-Ja-130M). External validation was performed using 280 real-world reports (3477 sentences) from 7 institutions in the Japan Medical Image Database, with ground-truth labels established by board-certified radiologists. Evaluation metrics included accuracy, macro-averaged F1 (macro F1) score, and positive predictive value for positive findings (PPV_1).
All models achieved high performance on the synthetic test set (accuracy: 0.938-0.951; macro F1-score: 0.924-0.940). Overall performance declined on the external validation dataset (accuracy: 0.783-0.813; macro F1-score: 0.761-0.790), reflecting distributional differences between synthetic and real-world reports; however, PPV_1 remained stable and high across datasets (eg, 0.957 on the synthetic test set vs 0.952 on the external validation dataset for Qwen3 [4B]). Parsing errors occurred in LLM-based approaches (19-260 sentences, 0.55%-7.48% in the external dataset).
This study demonstrates the feasibility of developing context-aware sentence classification models for Japanese radiology reports using a training pipeline based entirely on synthetic data. The stability of PPV_1 indicates that the models successfully captured the essential clinical terminology and linguistic patterns required to identify positive findings in real-world reports, despite the observed performance degradation during external validation. This approach substantially reduces manual annotation requirements and privacy risks, providing a scalable foundation for constructing structured radiology datasets to support the development of clinically relevant medical artificial intelligence models.
Incomplete clinical details on magnetic resonance imaging (MRI) examination requests (MERs) can lead to suboptimal protocol selection. An institutional secure large language model (sLLM) with access to manually retrieved salient data from the electronic medical record (EMR) may improve request completeness and protocol accuracy across multiple MRI subspecialties.
The objective of this study was to compare clinician MERs with sLLM-augmented MERs for information quality and to evaluate the protocoling accuracy of the sLLM versus board-certified radiologists across body, musculoskeletal, and neuroradiology MRI.
This retrospective study included 608 random outpatient MRI examinations performed between September 2023 and July 2024 (body 206, musculoskeletal 203, neuroradiology 199). The cohort comprised 528 patients (mean 51.2 years, SD 19.2; range 4-93; n=279, 52.8% women, n=249, 47.2% men). MERs without EMR access were excluded. A privately hosted Anthropic Claude 3.5 model (temperature 0) augmented each MER with manually retrieved salient EMR data and, via rule-based parsing, mapped the extracted elements onto predefined institutional criteria to recommend region or coverage and contrast use. Two experienced radiologists established a consensus reference standard. Two board-certified general radiologists (Rad 3 and Rad 4) and the sLLM were compared with this standard. Clinical information quality was graded using the Reason-for-Exam Imaging Reporting and Data System (RI-RADS). Interrater reliability was quantified with Gwet AC1. Paired accuracies were compared with the McNemar test to determine whether there was a statistically significant difference.
Interreader agreement for RI-RADS was almost perfect for sLLM-augmented MERs (AC1 0.97, 95% CI 0.94-0.99) and moderate for clinician MERs (AC1 0.43, 95% CI 0.34-0.52). Limited or deficient clinical information (RI-RADS C/D) fell to 0% to 0.7% (0/608 to 4/608) with sLLM augmentation vs 4.1% to 20.4% (25/608 to 124/608) for clinician MERs. Overall protocol accuracy was 93.1% (566/608; 95% CI 89.6-96.6) for the sLLM, 91.4% (556/608; 95% CI 87.6-95.3) for Rad 3, and 92.1% (560/608; 95% CI 88.4-95.8) for Rad 4 (sLLM vs Rad 3 P=.23 vs Rad 4 P=.40). Region or coverage accuracy was similar (sLLM: 579/608, 95.2%; Rad 3: 585/608, 96.2%; Rad 4: 573/608, 94.2%; P=.46 and P=.36). Contrast decisions were more accurate using the sLLM at 94.4% (574/608; 95% CI 91.3-97.5) vs Rad 3 at 92.1% (560/608; 95% CI 88.4-95.8; P=.027) and were not significantly different to Rad 4 at 92.9% (565/608; 95% CI 89.4-96.4; P=.16). Subspecialty analyses showed similar patterns, with the sLLM outperforming Rad 4 for musculoskeletal MRI contrast decisions (96.6% vs 91.1%; P=.006) and matching readers elsewhere. Manual review indicated that sLLM improvements arose from EMR details not listed on the MER (infection/inflammation, tumor history, prior surgery). No clinically significant hallucinations were identified in a manual review of discordant cases.
Across body, musculoskeletal, and neuroradiology MRI, sLLM-augmented examination requests improved clinical context and enhanced contrast selection while demonstrating accuracy comparable to general radiologists for region or coverage. Integrating sLLMs into routine vetting workflows may reduce manual workload in protocol selection for more efficient, standardized protocoling.
This study systematically investigated the influence of demographic characteristics on the readability of patient-centric radiology reports and compared the performance of different large language models (LLMs) in generating patient-centered reports. Adopting a sequential two-stage design, the research first conducted a retrospective evaluation involving 320 radiology reports followed by a clinical setting validation with 800 patients. Results suggested that all three LLMs significantly improved the readability of radiology reports (Pโ<โ0.05), with DeepSeek-R1 showing potentially superior performance within this specific cohort. Demographic analysis revealed significant interactive effects: higher education and older age (within consistent educational levels) were associated with better comprehension. Clinical setting validation further indicated that reading simplified reports suggesting the potential to significantly improved patients' subjective and objective comprehension while significantly alleviating medical anxiety (Pโ<โ0.05). However, limitations persist, including inconsistent model outputs, missing anatomical details, and comprehension variances driven by demographic factors. Consequently, LLMs should be integrated as auxiliary communication tools for radiologists rather than standalone solutions, necessitating personalized interventions tailored to specific demographic profiles.
Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM leverages self-supervised 2D transformer encoders to learn a volumetric representation that capture inter-slice dependencies from a sequence of slice-specific features. Unbound by sub-volumetric patchification, MS-VLM is capable of obtaining useful volumetric representations from 3D medical images with any slice length and from multiple images acquired from different planes and phases. We evaluate MS-VLM on publicly available chest CT dataset CT-RATE and in-house rectal MRI dataset. In both scenarios, MS-VLM surpasses existing methods in radiology report generation, producing more coherent and clinically relevant reports. These findings highlight the potential of MS-VLM to advance 3D medical image interpretation and improve the robustness of medical VLMs.
To evaluate the feasibility and limitations of real-world, text-only inference of PI-RADS v2.1 categories from prostate MRI reports using large language models, with lesion-level and zone-aware analysis.
This single-center retrospective study included 1,205 lesion-level entries from 1,118 patients derived from semi-structured prostate MRI reports after removal of all explicit PI-RADS elements. ChatGPT-4o was prompted to assign numeric PI-RADS categories based solely on report text. Agreement with radiologist-assigned reference categories was assessed using exact agreement, Cohen's ฮบ, and class-wise metrics. Analyses were performed overall, by zone (peripheral vs transition), and using collapsed risk strata (1-2/3/4-5). Discordant cases were reviewed to identify error mechanisms and severity. Human interobserver agreement, intra-model reproducibility, temporal stability, and a paired model-version sensitivity analysis comparing ChatGPT-4o with GPT-5.2 were also evaluated.
Overall exact agreement was 72.9% (ฮบย =ย 0.538; macro-F1ย =ย 61.2%), with a systematic tendency toward overcalling. Agreement was higher in the peripheral zone than in the transition zone (ฮบย =ย 0.476 vs 0.077, reference PI-RADS 3-5). PI-RADS 3 showed the lowest precision and recall, with frequent bidirectional misclassification. Collapsing categories improved agreement (ฮบย =ย 0.610). Incorrect diffusion-weighted imaging subscores were the most common error mechanism, with zone-specific differences. Clinically high-impact downgrades of PI-RADS 4-5 to 1-2 were rare (1.6%). Human interobserver agreement was excellent (ฮบย =ย 0.916-0.967). GPT-5.2 outperformed ChatGPT-4o in paired analyses but produced invalid outputs in a minority of cases.
Text-only large language models can infer radiologist-assigned PI-RADS v2.1 categories from real-world prostate MRI reports with moderate agreement, but performance is zone dependent and limited around PI-RADS 3, particularly in the transition zone. These models are best suited as supervised tools for quality control rather than autonomous decision-making.
RATIONALE AND
Staging gynecological malignancies is a complex process, and radiologists should be familiar with the evolution of FIGO staging criteria. Large Language Models (LLMs) offer potential to support radiologists by automating classification tasks from free-text MRI reports.
We conducted a retrospective study using two curated datasets of pelvic MRI reports from patients with cervical (n = 261, FIGO 2018) and endometrial cancer (n = 555, FIGO 2023). A general-purpose LLM (Cohere Command-A) was evaluated under three prompting strategies (zero-shot, guided, and chain-of-thought [CoT]), using exact stage accuracy, an ordinal FIGO distance metric, and the rate of severe errors. The Cohere Command-A model was chosen for its long-context reasoning, instruction-following capabilities, reproducible fixed version, and secure handling of sensitive clinical data. While alternative LLMs (eg, GPT-4o, Gemini, Llama-3, DeepSeek) could offer complementary insights, access, resources, and compliance constraints limited broader comparisons.
For cervical cancer, CoT prompting achieved the highest accuracy (80.5%) and the lowest FIGO distance, with 23 severe misclassifications (โฅ2-stage deviation), outperforming guided and zero-shot prompting. For endometrial cancer, all strategies performed appropriately, with CoT again yielding the best results (accuracy, 90.6%) and the lowest number of severe misclassifications (37 cases), compared with guided and zero-shot prompting. In a small subset of cases with no agreement between any prompting strategy and the reference label, manual review showed that only a minority presented potentially suboptimal annotations, suggesting that CoT-based predictions may also help flag doubtful reports.
The LLMs used demonstrated strong performance in automatically assigning FIGO stages for cervical and endometrial cancers from MRI reports. Their integration could reduce workload and improve consistency in staging. Further validation is needed before clinical implementation.
| Run at | Source | Hits | New | Status |
|---|---|---|---|---|
| 2026-04-26 00:00 | LitReview | 1 | completed | |
| 2026-04-21 16:11 | litreview:seed | 77 | 77 | seed-completed |
"prostatic neoplasms"[mesh] AND "magnetic resonance imaging"[mesh] AND "clinical trial"[pt]
The combined targeted and systematic biopsy (CTSBx) was the standard scheme for patients with visible suspicious lesions on MRI in recent years. 2024 European Association of Urology guideline recommended targeted and perilesional biopsy (TPLBx) for the diagnosis of patients with MRI-visible suspicious lesions. This randomized controlled trial aims to comprehensively evaluate the efficacy and safety profiles of TPLBx and CTSBx schemes.
A single-center noninferiority randomized controlled trial consecutively enrolled 380 biopsy-naรฏve patients (CTSBx: n = 190, TPLBx: n = 190) with a single unilateral suspicious lesion on prostate MRI from June 2024 to November 2024. The noninferiority margin was -15%. All biopsies were undertaken transrectally through the cognitive fusion technique. The primary outcome was Grade Group (GG) โฅ 2 cancer (GG โฅ 2-PCa) detection rate.
The GG โฅ 2-PCa (58% vs 58%, risk difference [RD]: 0.53% [95% CI: -9.4% to 11%]) and GG โฅ 3-PCa (30% vs 30%, RD: 0.53% [95% CI: -8.7% to 9.7%]) detection rates of TPLBx were noninferior to that of CTSBx (P < .001). There was no significant difference in PCa and GG1-PCa detection rates between the 2 groups (P > .050). The complication rate of TPLBx was significantly lower than that of CTSBx group (Clavien-Dindo scale โฅ 1: 62% vs 74%, P = .023), especially for bleeding-related complications (rectal bleeding: 34% vs 48%, P = .003; hematuria, 39% vs 56%, P < .001) and rectal pain (25% vs 34%, P = .018). TPLBx could significantly shorten the procedure time and saved the pathological cost (P < .001).
For patients with a single unilateral suspicious lesion on prostate MRI, TPLBx achieved the noninferior diagnostic efficacy of clinically significant PCa and better safety than the CTSBx scheme. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT06482658.
Metachronous oligometastases may represent a favorable disease state for local therapy after prior curative treatment. Stereotactic Magnetic Resonance-Guided Adaptive Radiation Therapy (SMART) provides precise targeting of nodal and soft tissue metastases. The primary objective was to assess the feasibility and safety of SMART for abdominopelvic metachronous oligometastases. Secondary objectives included assessing rates of toxicities and evaluating local control (LC). METHODS AND MATERIALS: Ten patients were enrolled with solid tumor metachronous abdominopelvic nodal or soft tissue metastases, โค7 cm in maximal diameter, and โค3 sites of active disease. All patients received 40 Gy in 5 fractions. Acute toxicities were graded per Common Terminology Criteria for Adverse Events v5 per-protocol follow-up over 1 year. Late toxicities and clinical outcomes were elucidated by chart review. LC, distant progression-free survival, and overall survival were analyzed using the Kaplan-Meier Method.
Eight patients with prostate cancer and 2 with renal cell carcinoma were enrolled in the study. All patients were successfully treated with SMART per-protocol without complications. The median follow-up after SMART was 4.22 years. Three patients experienced acute grade 1 toxicities; there were no higher grade or late toxicities. Among these 10 patients, 4-year LC and overall survival were both 90%, and 4-year distant progression-free survival was 20%. Two patients (1 prostate cancer, 1 renal cell carcinoma) remain with no evidence of disease, each at over 4 years following SMART and without receiving further systemic or local therapies.
With 4 years median follow-up, this small prospective trial reports low toxicity, supporting the feasibility of SMART metastasis-directed therapy for metachronous oligometastases with minimal risk of acute or late toxicity.
BACKGROUND AND
Three-dimensional (3D) augmented reality (AR) and artificial intelligence (AI) technologies have recently been introduced to enhance guidance during robot-assisted radical prostatectomy (RARP). By overlaying virtual and real-time images, this approach helps accurately localize hidden lesions during surgery, enabling the execution of tailored procedures. This study aimed to evaluate whether 3D-AI-AR guidance reduces positive surgical margins (PSMs) compared with standard tw0-dimensional (2D) magnetic resonance imaging (MRI)-based interventions.
In this prospective, multicenter randomized controlled trial (NCT06318559), 133 patients with extracapsular extension or bulging at preoperative MRI were enrolled and randomized (2:1) to either 2D MRI-guided (nย =ย 84) or 3D-AI-AR-guided RARP (nย =ย 49). All the patients underwent nerve-sparing RARP. Intraoperative selective biopsies were then performed at the level of the preserved neurovascular bundle (NVB): cognitive in the MRI group and AR guided in the 3D group. The primary outcomes included PSM rate. Prostate-specific antigen (PSA) levels, continence, and potency recovery were assessed during the 12ย mo of follow-up. The use of postoperative radiotherapy was recorded. Biochemical recurrence (BCR) was defined as PSA >0.4ย ng/ml. All the analyses were conducted with SAS Statistics Software v.9.4. KEY FINDINGS AND LIMITATIONS: Baseline and intraoperative characteristics were similar between the groups. While PSMs on prostate surface were comparable (pย =ย 0.8), 3D-guided excisional biopsies had a significantly higher positivity rate (52% vs 13%; pย =ย 0.001), allowing an improved margin control. The 3D group had a lower overall PSM rate (22% vs 39%; pย =ย 0.047), required less postoperative RT (18% vs 35%; pย =ย 0.046), and showed higher continence at 12ย mo (91% vs 71%; pย =ย 0.03). Potency and BCR rates were similar. CONCLUSIONS AND CLINICAL
The execution of a 3D-AI-AR-guided biopsy at the level of preserved NVBs during nerve-sparing RARP allows correct identification of the tumor with subsequent improvement of margin control. Longer follow-up is required to assess the functional and long-term oncological outcomes of this approach.
BACKGROUND AND
Prostate-specific antigen (PSA)-based screening for prostate cancer (PCa) has limited accuracy, and it is linked to overdiagnosis. The PROSA trial aimed to evaluate whether a contrast-free biparametric magnetic resonance imaging (bpMRI)-first screening strategy improves the detection of clinically significant PCa (csPCa) as the primary outcome. The secondary outcomes included overall PCa detection, benefit-harm metrics, and cost effectiveness from a health care payer perspective.
This single-center, randomized controlled trial enrolled 816 asymptomatic men aged 49-69ย yr (โฅ40ย yr with a PCa family history). Participants were randomized into two arms: arm A underwent bpMRI regardless of the PSA levels; arm B received bpMRI only if PSA โฅ3ย ng/ml (or 2.5ย ng/ml with a family history). Men with Prostate Imaging Reporting and Data System score โฅ3 were directed to a targeted biopsy. Imaging and pathology assessors were blinded; csPCa is defined as International Society of Urological Pathology grade group โฅ2. The primary outcomes included csPCa detection, benefit-harm metrics, and cost effectiveness from a health care payer perspective. KEY FINDINGS AND LIMITATIONS: Among 759 randomized men, biopsy and csPCa detection rates were higher in arm A (10.8% and 4.6%, respectively) than in arm B (5.2% and 1.8%, respectively), with a relative risk of 2.6 (95% confidence interval 1.1-6.1; pย =ย 0.05) for the csPCa detection rate. Benefit-harm metrics favored the MRI-first strategy, showing higher grade selectivity (1.89 vs 1.75), biopsy efficiency (0.74 vs 0.54), and biopsy avoidance (23.1 vs 11.9). No serious adverse event was recorded. The MRI-first strategy yielded an incremental cost-effectiveness ratio of โฌ2201.75 per csPCa case detected. Limitations include single-round design and short follow-up. CONCLUSIONS AND CLINICAL
In this randomized screening trial, a contrast-free MRI-first pathway improved csPCa detection, enhanced benefit-harm metrics, and showed favorable cost effectiveness.
DOminant-TArgeted Boost in Localized Prostate Cancer (DOTA-2) is a phase II randomised controlled trial comparing two ultra-hypofractionated radiotherapy with dominant intraprostatic lesion (DIL) boost: 26 Gy/2F, 32 Gy to DIL vs 36.25 Gy/5F, 40 Gy to DIL, without androgen deprivation therapy (ADT), for prostate cancer.
Patients with low- to favourable-intermediate-risk prostate cancer were randomly assigned to receive either 2 fractions or 5 fractions. Magnetic resonance-guided adaptive radiotherapy (MRgART) was delivered using the Unityยฎ MR-Linac with the adapt-to-shape workflow for every fraction. The primary endpoint was cumulative grade โฅ2 acute genitourinary (GU) and gastrointestinal (GI) toxicity. Secondary endpoints included quality of life in the urinary and sexual domains. An interim analysis of acute GU and GI toxicities was conducted on the first 22 patients from the total planned cohort of 44.
Patients were randomly assigned to either the 2-fraction (N = 10) or 5-fraction stereotactic body radiotherapy (SBRT) (N = 12), stratified by risk group, prostate volume, and DIL location. The median follow-up time was 16 weeks. The cumulative worst acute grade โฅ2 GU toxicity was reported in 2/10 (20%) patients in the 2-fraction group vs 4/12 (33.3%) in the 5-fraction group (P = 0.48), with no cases of grade โฅ3 acute GU toxicity. No grade โฅ2 acute GI toxicity was observed in either arm. The two groups had no significant difference in International Prostate Symptom Score (IPSS) and International Index of Erectile Function (IIEF-5) scores.
Two-fraction SBRT with a DIL boost, delivered using MRgART without ADT, demonstrated acceptable acute GU and GI toxicity in this interim analysis, suggesting the feasibility of continuing the investigation.
Multi-parametric MRI (mpMRI) datasets often vary between sites due to differences in acquisition protocols.
Evaluate adherence of multi-site mpMRI dataset to minimum technical standards (MTS) of PI-RADSv2.1. STUDY TYPE: Prospective. SUBJECTS: Six hundred patients (Age (years): โคโ49โ=โ0.8%, 50-59โ=โ10.7%, 60-69โ=โ47.0%, โฅโ70โ=โ41.5%) with intermediate-risk prostate cancer (PCa) imaged across 124 institutions prior to radiotherapy. FIELD STRENGTH/SEQUENCE: 3T, 1.5T, and 1.16T, T2-weighted (T2w): fast spin-echo, diffusion-weighted imaging (DWI): single-shot echo-planar imaging, and dynamic contrast-enhanced (DCE): T1-weighted 3D fast spoiled gradient echo. ASSESSMENT: Scanner vendors included Siemens, GE, Philips, Toshiba, and Hitachi. Degree of adherence to PIRADSv2.1 was determined as the proportion of datasets that met MTS. Mean and standard deviation of parameter values were calculated where applicable. Prostate imaging quality (PI-QUAL)v2 scores were assigned by one of three observers in 491 datasets. Evaluation of DICOM metadata consistency was performed. STATISTICAL TESTS: Fisher's exact test to assess changes in MTS adherence over time and by field strength; Harrel's C-index to compare MTS adherence to PI-QUAL score. A p value of <โ0.001 is considered statistically significant after Bonferroni correction.
Eighty-two percent of MTS showed greater than 75% adherence. Low adherence was found in the in-plane dimension (frequency-encoding direction) for T2w images (57%, meanโ=โ0.45โยฑโ0.16โmm) and field of view (FOV) for DW images (62%, meanโ=โ22.67โยฑโ4.70โcm). Only 50% of datasets used the recommended high b value image to compute the apparent diffusion coefficient map. Adherence improved significantly over time for one T2w and two DWI parameters; the adherence of FOV improved significantly at 3T for T2w and DWI sequences. C-index values for two T2w and two DWI parameters demonstrated a relationship between PI-RADS MTS and PI-QUAL score. Ten percent of anonymized datasets were stripped of some sequence information. DATA
Results show promise for mpMRI standardization in characterization of PCa and identify key parameters that remain variable across datasets and institutions. EVIDENCE LEVEL: 1. TECHNICAL EFFICACY: Stage 2. TRIAL REGISTRATION: ClinicalTrials.gov: NCT03367702. Variability in the way MRI scans are performed at different institutions and with different types of MRI scanners can make it difficult to obtain consistent results. We examined the MRI scan parameters of a large, multiโinstitutional dataset to determine how well they follow the guidelines outlined in the Prostate ImagingโReporting Data System (PIโRADS)v2.1. We found most of the parameters showed high adherence to PIโRADSv2.1. Further examination of those parameters with lower adherence may provide insight that could be beneficial to future efforts to standardize the way in which MRI scans are performed.
The primary aim of this study was to compare the diagnostic accuracy of [68Ga]Ga-PSMA-11 PET, [68Ga]Ga-RM2 PET, and multiparametric MRI (mpMRI) for the detection of primary prostate cancer (PCa) using histopathology as the reference. The secondary aims of the study were to assess the agreement among imaging modalities and identify noninvasive biomarkers for the diagnosis and risk stratification of patients.
Forty-two patients with biopsy-confirmed, high-risk PCa were enrolled in this single-center, prospective, phase 2 clinical trial between September 2020 and May 2023 at San Raffaele hospital. All patients underwent [68Ga]Ga-PSMA-11 PET/MRI with mpMRI, and 36 had additional imaging with [68Ga]Ga-RM2 PET/MRI. All patients were included in the patient-level T staging analysis. Twenty-five patients were treated with radical prostatectomy with extended lymphadenectomy and considered for N staging analysis. Sixteen patients underwent all imaging and surgical procedures needed for coregistration between imaging and histology and were included in the lesion-based analysis for T staging. Two expert nuclear medicine physicians reviewed [68Ga]Ga-PSMA-11 and [68Ga]Ga-RM2 PET images with knowledge of the patients' available clinical and imaging information. mpMRI was interpreted as the standard of care by 2 expert radiologists using Prostate Imaging Reporting and Data System, version 2, criteria. Peripheral whole-blood samples were collected at the time of patient's enrollment to assess their association with lymph node involvement on histology.
In the patient-based analysis, [68Ga]Ga-PSMA-11 PET and mpMRI identified at least 1 intraprostatic lesion in all patients, whereas [68Ga]Ga-RM2 PET results were negative in 3 of 36 patients. The lesion-level analysis performed in 16 patients showed that, in this cohort, the dominant intraprostatic lesion was always detected by [68Ga]Ga-RM2 PET, whereas both [68Ga]Ga-PSMA-11 PET and mpMRI missed it, reporting a false-positive finding elsewhere. For N staging analysis, [68Ga]Ga-PSMA-11 PET had the highest sensitivity among the investigated imaging modalities (sensitivity, 0.375). Blood analysis showed that a higher fraction of polymorphonuclear-myeloid-derived suppressor cells (MDSCs) over monocytic MDSCs was significantly associated patients with lymph node involvement on histology (P = 0.0285).
All imaging modalities showed high sensitivity for the preoperative detection of primary PCa, but only [68Ga]Ga-RM2 PET correctly identified the dominant lesion in all patients who underwent lesion-based subanalysis. The identification of lymph node involvement remains challenging, with [68Ga]Ga-PSMA-11 PET reaching a sensitivity of only 0.375. In this regard, the polymorphonuclear MDSC-to-monocytic MDSC ratio may represent a valuable biologic marker of lymph node involvement in patients with high-risk PCa and warrants further investigation.
Systematic transrectal ultrasound-guided biopsy lacks accuracy in the primary diagnosis of prostate cancer (PCa) and causes side effects. We investigated prostate-specific membrane antigen (PSMA)-targeted PET/MRI as a less-invasive alternative for biopsy guidance and risk assessment.
The RAPID study was a randomized, controlled, single-center, open-label phase 3 trial comparing the diagnostic efficacy of 68Ga-PSMA-11 PET/MRI with systematic transrectal ultrasound-guided prostate biopsy. In total, 220 men with suspected PCa were randomized to either a standard (random 12-core biopsy; RB) group or an image-guided biopsy (IGB) group. Biopsy, prostatectomy histology, and follow-up visits served as references.
PET/MRI prospectively predicted 91 of 113 histologically verified tumors, corresponding to a sensitivity of 80.5% and a positive predictive value of 84.3%. Among tumors characterized as ISUP GG of 3 or greater (n = 60), PSMA PET/MRI prospectively detected 95% (n = 57). The IGB group demonstrated slightly higher sensitivity, specificity, positive predictive value, and negative predictive value compared with the RB group (79.3%, 94.7%, 85.2%, 92.2% vs. 74.2%, 88.0%, 71.9%, 89.2%). Seventy-nine patients were eligible for a direct IGB and RB subanalysis, with IGB detecting 15 additional cases. PET/MRI showed high specificity (94%) and negative prediction (86%) for tumor aggressiveness. In a median follow-up period of 3 y, an aggressive course of disease was detected in 25 of 199 patients. RB correlation identified 24 patients with an ISUP GG of 3 or greater with aggressive disease development during follow-up, compared with 23 patients identified by PET/MRI. Negative prediction of both methods was comparably high at 99%; however, PET/MRI overestimated fewer patients (21) as aggressive compared with RB (34).
PSMA-targeted PET/MRI-guided biopsy is a reliable, less invasive method for detecting and characterizing PCa in a cohort with moderately increased PSA values, potentially reducing unnecessary biopsies and provides a reliable prognosis of the course of disease. These results support the integration of modern imaging techniques into clinical practice to improve the treatment of PCa.
Accurate staging of unfavorable intermediate- or high-risk prostate cancer (PCa) is essential for treatment decisions. Conventional imaging often fails to detect lymph node, bone, and visceral metastases, and for this purpose 68Ga-prostate-specific membrane antigen (PSMA)-11 PET/CT is clinically used. This prospective, multicenter, International Atomic Energy Agency-supported trial evaluated the accuracy of 68Ga-PSMA-11 PET/CT for initial staging compared with MRI and histopathology and the impact of 68Ga-PSMA-11 PET/CT on determining surgical eligibility.
In a prospective, international study supported by the International Atomic Energy Agency, 775 patients with high-risk or unfavorable intermediate-risk PCa from 12 centers across 11 countries-including low-, middle-, and high-income settings, scheduled for radical prostatectomy based on conventional imaging (including bone scanning and pelvic MRI) underwent 68Ga-PSMA-11 PET/CT before treatment. PET and MRI findings were compared with radical prostatectomy histopathology, and the impact of PET on radical prostatectomy was assessed.
68Ga-PSMA-11 PET/CT detected metastatic disease (M1) in 20.4% of cases, altering management and preventing prostatectomy in 24.0%. The accuracy for seminal vesicle invasion was 90.1% for 68Ga-PSMA-11 PET/CT versus 57.3% for MRI, and for lymph node metastases it was 91.1% for 68Ga-PSMA-11 PET/CT versus 69.7% for MRI. In 13.1% of patients (78/593), there were discordant results between 68Ga-PSMA-11 PET/CT and histopathology. 68Ga-PSMA-11 PET/CT had false-negative lymph node findings in 8.6% of cases, with the most clinically significant being 4.5% of patients incorrectly staged as N0. False-positive lymph node findings at 68Ga-PSMA-11 PET/CT occurred in 4.5% of patients.
68Ga-PSMA-11 PET/CT significantly improves staging accuracy, reducing the indication for prostatectomy and impacting treatment decisions. These findings, from a broad international cohort including low-, middle-, and high-income countries, support the global adoption of 68Ga-PSMA-11 PET/CT into standard staging protocols for high-risk PCa.
Previous studies have indicated that regional saturation prostate biopsy (RSB) is more effective than targeted biopsy (TB) or systematic biopsy (SB) for patients with prostate-specific antigen (PSA) levels between 4 and 20ย ng/mL. However, its efficacy in patients with PSA levels โฅ 20ย ng/mL remains unclear. PATIENTS AND
In this prospective, single-center, randomized controlled trial, we enrolled patients with PSA levels greater than 20ย ng/mL and suspicious magnetic resonance imaging (MRI) findings from January 2021 to August 2023. The participants were randomized to undergo RSB or TB, and SB was also performed. The primary endpoint was the detection rate of clinically significant prostate cancer (csPCa), defined as an International Society of Urological Pathology (ISUP) grade โฅ 2.
RSB detected csPCa more frequently than did TB (90.2% versus 82.9%, p = 0.01) and SB (90.2% versus 82.5%, p = 0.01). Supplementary SB did not increase csPCa detection in the RSB group but did increase it in the TB group. Subgroup analysis revealed that RSB was particularly effective for patients with PSA levels between 20 and 50ย ng/mL, prostate imaging-reporting and data system (PI-RADS) score of 3 lesions, prostate volume (PV) > 45ย mL, and PSA density (PSAD) < 1.0ย ng/mL/cc. However, the single-center design limits the generalizability of our findings.
Our trial suggests that RSB is superior to TB in detecting significant prostate cancers among patients with high PSA levels (โฅ 20ย ng/mL). Notably, perilesional biopsy is crucial for those with PSA levels between 20 and 50ย ng/mL, larger PV, low PSAD, and low PI-RADS scores, enhancing csPCa detection.
| Run at | Source | Hits | New | Status |
|---|---|---|---|---|
| 2026-04-26 00:00 | LitReview | completed | ||
| 2026-04-19 00:00 | LitReview | completed | ||
| 2026-04-14 14:30 | LitReview | completed | ||
| 2026-04-14 14:29 | LitReview | completed | ||
| 2026-04-14 14:28 | LitReview | error | ||
| 2026-04-14 11:57 | LitReview | error | ||
| 2026-04-12 00:00 | LitReview | completed | ||
| 2026-04-05 16:08 | litreview:seed | 101 | 101 | seed-completed |
(radiology[tiab] OR imaging[tiab]) AND ("large language model*"[tiab] OR LLM[tiab]) AND (personalized[tiab] OR personalization[tiab] OR style[tiab] OR adaptation[tiab]) AND (reporting[tiab] OR impression[tiab] OR conclusion[tiab])
RATIONALE AND
To evaluate the application of DeepSeek-assisted case-based learning (CBL) in respiratory radiology course across the full instructional cycle, including preparation, implementation, and evaluation.
This prospective, single-center study was conducted in 2025 and involved third-year medical undergraduates. CBL Preparation: Six cases were retrieved from the Hospital Information System (HIS), and six generated via DeepSeek-R1. Preparation times were recorded and compared. CBL Implementation: Students were assigned to either a DeepSeek-assisted group or a traditional CBL group, with discussion time recorded for each subgroup. CBL Evaluation: Teaching effectiveness was evaluated through test scores and questionnaires. Subsequently, DeepSeek-R1 provided personalized feedback to students based on their individual scores.
A total of 200 students (mean age 21.02ย ยฑย 0.89 years, 94 males) participated. DeepSeek-generated cases required significantly less time than HIS-retrieved cases (p = 0.016). During implementation, the DeepSeek group spent less discussion time than traditional group (p = 0.026). The DeepSeek-assisted group achieved greater test score improvements compared to the traditional group (p < 0.05). Questionnaire responses indicated higher self-directed learning, greater interest in radiology, improved learning efficiency, and lower perceived learning burden in the DeepSeek-assisted group (p < 0.05). Additionally, personalized feedback generated by DeepSeek was qualitatively reviewed by the radiology teaching department and considered educationally useful.
This study demonstrates that DeepSeek-assisted CBL effectively supports respiratory radiology education throughout the entire course process-preparation, implementation, and evaluation-by enhancing efficiency, boosting student interest and engagement, improving performance, and providing valuable post-class feedback.
Accurate tumor node metastasis (TNM) staging is fundamental for treatment planning and prognosis in non-small cell lung cancer (NSCLC). However, its complexity poses significant challenges. Traditional rule-based natural language processing methods are constrained by their reliance on manually crafted rules and are susceptible to inconsistencies in clinical reporting.
This study aimed to develop and validate a robust, accurate, and operationally efficient artificial intelligence framework for the TNM staging of NSCLC by strategically enhancing a large language model, GLM-4-Air (general language model), through advanced prompt engineering and supervised fine-tuning (SFT).
We constructed a curated dataset of 492 deidentified real-world medical imaging reports, with TNM staging annotations rigorously validated by senior physicians according to the AJCC (American Joint Committee on Cancer) 8th edition guidelines. The GLM-4-Air model was systematically optimized via a multi-phase process: iterative prompt engineering incorporating chain-of-thought reasoning and domain knowledge injection for all staging tasks, followed by parameter-efficient SFT using low-rank adaptation for the reasoning-intensive primary tumor characteristics (T) and regional lymph node involvement (N) staging tasks. The final hybrid model was evaluated on a completely held-out test set (black-box) and benchmarked against GPT-4o using standard metrics, statistical tests, and a clinical impact analysis of staging errors.
The optimized hybrid GLM-4-Air model demonstrated reliable performance. It achieved higher staging accuracies on the black-box test set: 92% (95% CI 0.850-0.959) for T, 86% (95% CI 0.779-0.915) for N, 92% (95% CI 0.850-0.959) for distant metastasis status (M), and 90% for overall clinical staging; by comparison, GPT-4o attained 87% (95% CI 0.790-0.922), 70% (95% CI 0.604-0.781), 78% (95% CI 0.689-0.850), and 80%, respectively. The model's robustness was further evidenced by its macro-average F1-scores of 0.914 (T), 0.815 (N), and 0.831 (M), consistently surpassing those of GPT-4o (0.836, 0.620, and 0.698). Analysis of confusion matrices confirmed the model's proficiency in identifying critical staging features while effectively minimizing false negatives. Crucially, the clinical impact assessment showed a substantial reduction in severe category I errors, which are defined as misclassifications that could significantly influence subsequent clinical decisions. Our model committed 0 category I errors in M staging and fewer category I errors in T and N staging. Furthermore, the framework demonstrated practical deployability, achieving efficient inference on consumer-grade hardware (eg, 4 RTX 4090 GPUs) with latencies suitable and acceptable for clinical workflows.
The proposed hybrid framework, integrating structured prompt engineering and applying SFT to reasoning-heavy tasks (T/N), enables the GLM-4-Air model to serve as a highly accurate, clinically reliable, and cost-efficient solution for automated NSCLC TNM staging. This work demonstrates the efficacy and potential of a domain-optimized smaller model compared with an off-the-shelf generalist model, holding promise for enhancing diagnostic standardization in resource-aware health care environments.
Patients referred for specialized care often arrive with outside medical records (OMRs) compiled into multi-report PDFs that include imaging, pathology, and clinical notes in unstructured formats. Reviewing these records is time consuming and mentally taxing, increasing the risk of delayed care, clinician frustration, and missed information affecting quality of care. This study aimed to automate the segmentation, classification, and date extraction of scanned OMRs, with a focus on records relevant to breast cancer care.
We used optical character recognition (OCR) to extract machine-readable text from 1303 scanned PDF documents from 116 distinct external institutions. Gemini 1.5, a large language model (LLM), was then used to segment multi-report files into individual documents, classify them into clinically meaningful categories such as mammograms and pathology reports, and extract study dates to build diagnostic timelines. Document categories were informed by clinical workflows in a breast cancer center.
The system achieved an F1 score of 0.95 for segmentation, 0.96 for classification, and 0.90 for date extraction. In a pilot of 45 records reviewed by clinicians, only 2 classification errors and 1 date error were reported. Clinicians estimated that the tool reduced OMR review time by 40%, improved workflow efficiency, and increased satisfaction.
Our findings demonstrate that combining OCR with LLMs can significantly enhance the processing of unstructured medical records, reducing manual burden and supporting timely clinical decision-making.
This study demonstrates the successful application of OCR and LLMs for organizing scanned OMRs within a specialty clinic. By automating a previously manual process, the approach supports scalable review of incoming outside records and has potential for adaptation to other clinical workflows. Future work will focus on evaluating the system across additional specialties and institutions.
Pancreatic cancer requires nuanced, multidisciplinary treatment planning typically conducted within tumor boards. While Large Language Models (LLMs) have shown capabilities in medical reasoning, their ability to approximate complex, integrative decision-making in oncology remains underexplored.
This study evaluated the performance of LLaMA 3.3 (70b) in predicting tumor board decisions for newly diagnosed pancreatic cancer patients. Clinical documentation (including free-text imaging reports, pathology findings, and patient history) from 42 first-diagnosis cases discussed in a real-world tumor board was collected. The model was tasked with predicting one of three treatment options: surgical resection (SURG), neoadjuvant chemotherapy (NEO), or palliative therapy (PALL). Four prompting strategies were evaluated: zero-shot, advanced (adv.) zero-shot, Chain-of-Thought (CoT), and few-shot prompting. Performance was assessed using accuracy, micro- and macro-averaged F1 scores, and category-specific recall.
The advanced zero-shot and CoT strategies achieved the highest overall accuracy of 78.6% and a micro-averaged F1 score of 0.786. However, this performance was driven primarily by the correct classification of majority classes (SURG and PALL). Crucially, both high-accuracy strategies failed to identify any of the neoadjuvant therapy candidates (Recall NEOโ=โ0.00; 0/7 cases), systematically misclassifying them as palliative or surgical. While few-shot prompting improved the detection of neoadjuvant cases (Recall NEOโ=โ1.00), it introduced substantial noise, reducing overall accuracy to 56.7%. LLaMA 3.3 (70b) demonstrates high concordance with tumor board decisions for clear-cut surgical or palliative cases but exhibits a critical systematic failure in identifying candidates for neoadjuvant therapy. The high global accuracy masks a significant safety limitation regarding the recognition of complex, intermediate-stage patients.
These findings suggest that current LLMs may approximate majority-class decisions but risk overlooking curative treatment pathways in nuanced scenarios, necessitating rigorous oversight and specific adaptation before clinical consideration.
Developing effective Convolutional Neural Networks (CNN) for soft tissue sarcoma detection often requires numerous iterations and adjustments, demanding specialized IT (Information Technology) skills. This study aims to use ChatGPT 4 to simplify CNN adaptation, reducing the need for specialized IT skills while enabling efficient exploration of training configurations to enhance diagnostic accuracy.
This study leveraged a preexisting Artificial Intelligence (AI) model adapted using a preexisting Convolutional Neural Network (CNN). The study involved 54 participants diagnosed with primary soft tissue sarcomas in the extremities and possessing complete Magnetic Resonance Imaging (MRI) datasets. AI adaptations and programming were conducted using TensorFlow and verified with ChatGPT. Model training involved a dataset split of 70% training, 15% validation and 15% test set on patient level split, processed over eight epochs.
The adapted CNN model demonstrated significant improvement across various MRI sequences, achieving high accuracy levels (up to 98.5%) and excellent sensitivity and specificity rates. The model performed robustly in differentiating tumor presence in MR images, with test accuracies as high as 93.9%. The inclusion of a Gradient-weighted Class Activation Mapping (Grad-CAM) heat map and probability scores in the diagnostic outputs further enhanced interpretative capabilities.
This study highlights the potential of AI, particularly CNNs, in the early and accurate detection of soft tissue sarcomas, underscoring the technology's adaptability across different imaging modalities. The integration of large language models like ChatGPT into the model adaptation process emphasizes the reduced need for specialized IT skills, making advanced diagnostic tools more accessible and potentially improving diagnostic accuracy and patient outcomes in radiology and oncology.
Accurate documentation of distant recurrence sites in breast cancer is essential for evaluating treatment effectiveness and outcomes research. However, such information is embedded in unstructured clinical notes, making manual abstraction labor-intensive. Large language models (LLMs) offer a scalable solution for extracting complex information from heterogeneous clinical narratives; however, generic LLMs often lack the specialized clinical reasoning needed for accurate interpretation of oncologic documentation. This study aims to develop an efficient LLM-based framework to automatically extract distant recurrence sites from free-text documentation. MATERIALS &
We used clinical notes, pathology and radiology reports from recurrent breast cancer patients at Mayo Clinic (nย =ย 766) for model development and evaluated generalizability on internal hold-out samples (nย =ย 112) and an external Stanford Medicine cohort (nย =ย 110). For cross-disease domain adaptation, we further validated on prostate cancer patients (nย =ย 49). Our proposed framework employs BioLinkBERT, a pretrained language model (PLM) backbone, with weak supervision and an epoch-wise entropy optimization to address limited labeled data and class imbalance across recurrence sites. The fine-tuned model was compared against state-of-the-art models, including Llama2-7B, Llama-3-8B and MedAlpaca, using precision, recall, and F1-score.
The fine-tuned model outperformed generic and domain-specific LLM baselines, with notable gains in identifying multi-site distant recurrence. In-domain validation showed consistent F1-score improvement (average 0.78), particularly for rare recurrence sites. The model also demonstrated strong performance on the external Stanford cohort and on prostate cancer, achieving F1-score of 0.83 and 0.93, respectively.
This study presents an efficient, weakly supervised LLM framework that accurately extracts metastatic recurrence sites, reducing reliance on manual chart review. The results demonstrate that relatively small LLMs, optimized with domain-aware weak supervision, can outperform larger models for complex oncologic information extraction. The model is released as a platform-independent Docker image to support seamless cancer registry integration.
Artificial intelligence (AI) has emerged as a transformative force in ophthalmology, enabling automated, accurate, and efficient clinical reporting. This review summarizes recent advances in AI-driven report generation, emphasizing the integration of multimodal imaging and clinical data. Deep learning and natural language processing (NLP) models can synthesize information from diverse sources-including fundus photography, optical coherence tomography, fluorescein angiography, and patient records-to generate structured, interpretable, and personalized diagnostic reports. Such systems enhance diagnostic precision, streamline workflow, and reduce interobserver variability. We outline the technological foundations underlying these systems, including convolutional and transformer-based architectures, self-supervised and multimodal learning, and large language models. Representative applications in diabetic retinopathy, glaucoma, cataract, and age-related macular degeneration are discussed, highlighting their clinical value and emerging real-world deployment. Persistent challenges-including data heterogeneity, model interpretability, ethical governance, and clinical integration-are critically reviewed. Finally, we explore future directions such as real-time AI-assisted reporting, predictive and personalized analytics, and global scalability across healthcare ecosystems. Multimodal, explainable, and clinically integrated AI systems hold promise to redefine ophthalmic diagnostics and improve both clinician efficiency and patient outcomes.
This study aims to explore the application of artificial intelligence in medical education by comparing research hotspots and evolutionary trends between China and the international community, ultimately proposing informed educational practices and policy recommendations.
Literature was retrieved from the core collections of CNKI and Web of Science for the period 2014-2024, limited to article and review publications. After applying a unified Boolean search strategy and deduplication, the data were analyzed using CiteSpace 6.4.R1 to examine publication trends, collaboration networks, keyword co-occurrence/clustering/burst detection, and co-citation patterns.
A total of 379 Chinese and 552 English records were included. Publications surged after 2018 and peaked during 2023-2024. International hotspots centered on machine learning, deep learning, and large language models for simulation-based training and clinical reasoning; Chinese studies focused on "New Medical Sciences", VR/AR, and medical imaging. The emergence of generative artificial intelligence and multimodal large models has become a new frontier in artificial intelligence research within global medical education from 2023 to 2024.
This study is based on a comparison of two databases to reveal the hotspots and differences in artificial intelligence and medical education research between China and the international research community. It not only compensates for the time lag of existing research, but also proposes three major trends driven by artificial intelligence in the development of medical education (generative AI, personalized learning, immersive experience). A complementary pattern exists between technology-driven and scenario-driven orientations. We recommend integrating AI literacy and ethics into curricula, establishing Generative-AI teaching/assessment guidelines, and building cross-institutional, yearly knowledge-map monitoring for sustainable innovation in medical education.
Large language models (LLMs) are increasingly being evaluated for their ability to answer official radiology board-style examination questions. Understanding their accuracy, limitations, and potential applications in education is essential for assessing their utility in the field.
A scoping review was conducted in October 2025 across PubMed, Scopus, and Web of Science, following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Studies were included if they evaluated LLMs on official radiology board-style examination questions. After screening 205 unique records, 29 studies met the inclusion criteria. Data were extracted on study characteristics, including LLM type and version, input modality, language, examination type, answer format, comparison with humans, and reported outcomes.
The reviewed studies evaluated multiple LLMs, predominantly Chat Generative Pre-trained Transformer (GPT)-based models (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o), as well as Claude, Gemini, Llama 3, and Mixtral. Text-only evaluations generally yielded higher accuracy (โ65%-90%) compared with multimodal tasks (45%-89%). GPT-4 and its variants consistently outperformed earlier versions, occasionally exceeding average human performance. Open-source models such as Llama 3 70B and Mixtral achieved comparable results to proprietary models, offering advantages in local deployment and privacy. Few studies directly compared LLM performance with human radiologists.
LLMs demonstrate promising performance in answering text-based radiology board-style examination questions, particularly GPT-4-based models. Nevertheless, significant limitations persist in multimodal tasks and complex reasoning scenarios.
Severe community-acquired pneumonia (SCAP) is a significant global health challenge due to its high mortality. Despite advances, early diagnosis and effective management remain critical. Tools like radiomics analyze imaging data for risk assessment, while machine learning and nomograms aid in personalized treatment. Large language models (LLMs) enhance clinical decision-making by analyzing data and supporting care strategies. This study integrates these methods to predict 28-day mortality in SCAP patients.
A cohort of 599 patients diagnosed with severe community-acquired pneumonia (SCAP), including 316 males and 283 females, from Shanghai East Hospital and Xiamen Humanity Hospital were enrolled in this study. High-resolution lung CT scans were used to segment three-dimensional regions of interest, from which 1,050 radiomic features were extracted. The dataset was divided into a training set (80%) and an independent test set (20%), and k-fold cross-validation was applied to optimize model performance. To address class imbalance, the SMOTE oversampling technique was employed. The study integrated radiomics, nomograms, seven machine learning models, and five LLMs to predict the 28-day mortality risk in SCAP patients. SHAP values were utilized to enhance the interpretability of feature contributions. Not only that, this study integrates the prior knowledge provided by LLMs, processed through an embedding layer, with data-driven feature learning in the main network, and dynamically fuses their outputs using a bias network with a gating mechanism, thereby improving the accuracy and interpretability of LLMs in predicting 28-day mortality risk for SCAP patients.
Key predictors of 28-day mortality included inflammatory markers, cytokines, age, CRP, and oxygenation index. Clinical-Radiomics models achieved strong accuracy (AUC 0.92). Machine learning models, particularly XGBoost (AUC 0.90), were highly effective, with SHAP analysis emphasizing radscore's importance. LLMs like Chatgpt also performed well (AUC 0.78), showcasing the potential of integrating clinical, radiomic, and AI-driven approaches.
This study demonstrates the effectiveness of radiomics, machine learning, and LLMs to predict SCAP outcomes. Models like XGBoost achieved superior accuracy, while SHAP analysis improved interpretability. These advancements highlight the potential for enhanced SCAP prognosis and personalized care strategies.
| Run at | Source | Hits | New | Status |
|---|---|---|---|---|
| 2026-04-19 00:00 | LitReview | 1 | completed | |
| 2026-04-14 14:29 | LitReview | completed | ||
| 2026-04-14 14:28 | LitReview | completed | ||
| 2026-04-14 14:28 | LitReview | error | ||
| 2026-04-14 11:57 | LitReview | error | ||
| 2026-04-12 00:00 | LitReview | 1 | 1 | completed |
| 2026-04-05 00:00 | LitReview | 2 | completed | |
| 2026-04-04 07:56 | pubmed:seed | 58 | seed-completed |
("ureteral stone*"[tiab] OR "ureteral calculus"[tiab] OR "ureteral calculi"[tiab] OR urolithiasis[tiab]) AND (KUB[tiab] OR "kidney, ureter, and bladder"[tiab] OR "kidney ureter bladder"[tiab] OR "abdominal radiograph*"[tiab] OR "plain radiograph*"[tiab] OR radiograph*[tiab] OR x-ray[tiab]) AND ("artificial intelligence"[tiab] OR "deep learning"[tiab] OR "machine learning"[tiab] OR "neural network*"[tiab] OR "computer-aided diagnosis"[tiab] OR AI[tiab] OR CNN[tiab])
Urolithiasis is a prevalent urological condition, and Non-Contrast Computed Tomography (NCCT) is the gold standard for diagnosis. In recent years, there has been growing interest in investigating machine learning (ML)- based detection of urolithiasis and the wider potential of AI in urology.
To synthesise the diagnostic accuracy of ML-based UTS detection on NCCT and in externally validated cohorts.
We performed a systematic review and bivariate meta-analysis of studies evaluating ML for detecting urinary stones. We used QUADAS-2 to assess the risk of bias. Subgroup analyses examined performance by model type, classification task, stone site, dataset source, and CT orientation. Bivariate meta-regression was performed to further explore heterogeneity. Publication bias was assessed using Deeks' test. The study was prospectively registered in Prospero (CRD42024542409).
Forty-five studies were included qualitatively. 24 studies (49,277 test images) provided extractable 2ย รย 2 data for meta-analysis. For NCCT (10 studies), pooled sensitivity was 96% (95% CI 92-98%) and pooled specificity was 98% (95% CI 97-99%). In externally validated NCCT cohorts (4 studies; 1,056 images), pooled sensitivity was 95% (95% CI 92-97%) and pooled specificity was 96% (95% CI 70-100%). Subgroup performance remained high, but heterogeneity persisted; meta-regression found stone site contributed to variability (pย =ย 0.014), while other moderators were not significant. Deeks' test showed no small-study effects (pย =ย 0.571).
ML models show high image-level diagnostic performance for stone detection on NCCT and may support radiologists as decision support tools. Translation is limited by heterogeneity and limited external validation. Future studies should move beyond detection-alone tasks towards clinically meaningful outputs that are actionable for radiologists and downstream clinicians, including urologists and nephrologists.
In low-invasive surgical treatment of urolithiasis, there is a need for an analytical method to determine the chemical composition of urinary stones in real-time mode, i.e., intraoperatively. While a thorough phase analysis can be done after the surgery, preliminary information about a target stone would be helpful for the specialists for choosing an optimal strategy of treatment and giving some immediate dietary or drug prescriptions to a patient. Near-infrared spectroscopy (NIRS) is a good candidate for such a method that can provide immediate results without obligatory sample preparation. Fiber optic probes, often used for acquiring near-infrared spectra, are compatible with surgical instrumentation. Chemometric algorithms can successfully resolve the complexity of NIR spectra, which consist of overlapped signals. For the first time, we applied NIRS in diffuse reflectance mode to classify three major types of urinary stones: oxalates, urates, and phosphates. To imitate the real conditions of a surgery, the NIR spectra were acquired not only under ambient conditions but also in saline medium. A trained and optimized multinomial classifier (Error Correcting Output Codes) showed an acceptable precision and recall for an independent validation dataset. Even considering the strong absorbance of saline, the calculated geometric mean was 94ย %, 87ย %, and 71ย % for oxalates, urates, and phosphates, respectively. A first real-time approbation during a real surgery (percutaneous nephrolithotomy) demonstrated a compatibility of the suggested approach with the surgical protocols and a good agreement of the acquired NIR spectra and the results of reference X-ray phase analysis.
In small animal practice, patients often present with urinary lithiasis, and prediction of urolith composition is essential to determine the appropriate treatment. Through abdominal radiographs, the composition of mineral radiopaque uroliths can be determined by considering many different factors; this can be complex and, as such, tailor-made for the use of artificial intelligence (AI). The Minnesota Urolith Center partnered with Hill's Pet Nutrition to develop a deep learning AI algorithm (CALCurad) within a smartphone application called the MN Urolith Application that allows for the preliminary assessment of urolith composition. The algorithm provides the probability of a urolith being composed of struvite from an image taken of an abdominal radiograph. This pilot study evaluates the accuracy of the CALCurad in the context of clinical practice. A sample population of 139 dogs was considered, and the results obtained by the CALCurad were compared with the results obtained by infrared spectroscopy analysis. Agreement between the application and quantitative analyses was 81.3%. These results suggest that the CALCurad can effectively be used to predict urolith composition in dogs, helping the clinician to decide between medical and surgical management of the patient. The use of the CALCurad is an example of the usefulness of AI in helping veterinarians make clinical decisions in patient care.
Diagnosing ureteral stones with low-dose CT in patients with metal hardware can be challenging because of image noise. The purpose of this study was to compare ureteral stone detection and image quality of low-dose and conventional CT scans with and without deep learning reconstruction (DLR) and metal artifact reduction (MAR) in the presence of metal hip prostheses.
Ten urinary system combinations with 4 to 6 mm ureteral stones were implanted into a cadaver with bilateral hip prostheses. Each set was scanned under two different radiation doses (conventional dose [CD] = 115 mAs and ultra-low dose [ULD] = 6.0 mAs). Two scans were obtained for each dose as follows: one with and another without DLR and MAR. Two blinded radiologists ranked each image in terms of artifact, image noise, image sharpness, overall quality, and diagnostic confidence. Stone detection accuracy at each setting was calculated.
ULD with DLR and MAR improved subjective image quality in all five domains (p < 0.05) compared with ULD. In addition, the subjective image quality for ULD with DLR and MAR was greater than the subjective image quality for CD in all five domains (p < 0.05). Stone detection accuracy of ULD improved with the application of DLR and MAR (p < 0.05). Stone detection accuracy of ULD with DLR and MAR was similar to CD (p > 0.25).
DLR with MAR may allow the application of low-dose CT protocols in patients with hip prostheses. Application of DLR and MAR to ULD provided a stone detection accuracy comparable with CD, reduced radiation exposure by 94.8%, and improved subjective image quality.
Background This study aims to identify the reliability of kidney-ureter-bladder (KUB) radiography as a triage tool in acute ureteral colic (AUC). Moreover, this article correlates between KUB and non-contrast computerized tomography (NCCT) in view of stone characteristics and clinical outcomes. Methodology A retrospective cohort study recruited patients who had proven ureteric stones on NCCT. A blinded review of KUB and NCCT was performed to identify the following variables in both tests: site, ureteric stone maximum diameter, and stone density. Correlation between KUB radiography and NCCT has been performed. The intermethod reliability was used to measure the degree to which test scores are consistent when the methods or instruments employed vary. Results One hundred fifty-one patients were included, of whom 75 (50%) had negative KUB and positive NCCT results for ureteric stones based on the blinded review. Lower ureteral calculi were found to be the most common location in both KUB (n = 49, 65%) and NCCT images (n = 81, 54%). The median stone diameters of KUB and NCCT were 5 (3-8) mm and 6 (4-9) mm, respectively. Hounsfield unit densities of more than 630 were found in 86 (57%) patients, and radiopaque stones were found in 76 (50%) patients. There was moderate and significant concordance (Cohen's kappa = 0.520) between NCCT and KUB regarding stone location (P < 0.01). There was a strong concordance (Cohen's kappa = 0.804) between NCCT and KUB in detecting ureteric stone maximum diameter (P < 0.01). Stone density was weakly correlated between KUB and NCCT (Cohen's kappa = 0.254) (P = 0.001). Thirty-four cases (45%) of negative KUB results required surgical intervention (SI). Sepsis (n = 5, 15%) and acute kidney injury (n = 23, 68%) were the main indications for SI in negative KUB and positive NCCT ureteric stones. Conclusions KUB radiography should not be used as a triage tool in AUC due to potentially harmful outcomes. However, KUB radiography can be reliably used during follow-up, as there is a strong correlation between KUB radiography and NCCT for KUB-detectable ureteric stones.
<b><i>Objective:</i></b> We sought to use artificial intelligence (AI) to develop and test calculators to predict spontaneous stone passage (SSP) using radiographical and clinical data. <b><i>Methods:</i></b> Consecutive patients with solitary ureteral stones โค10 mm on CT were prospectively enrolled and managed according to American Urological Association guidelines. The first 70% of patients were placed in the "training group" and used to develop the calculators. The latter 30% were enrolled in the "testing group" to externally validate the calculators. Exclusion criteria included contraindication to trial of SSP, ureteral stent, and anatomical anomaly. Demographic, clinical, and radiographical data were obtained and fed into machine learning (ML) platforms. SSP was defined as passage of stone without intervention. Calculators were derived from data using multivariate logistic regression. Discrimination, calibration, and clinical utility/net benefit of the developed models were assessed in the validation cohort. Receiver operating characteristic curves were constructed to measure their discriminative ability. <b><i>Results:</i></b> Fifty-one percent of 131 "training" patients spontaneously passed their stones. Passed stones were significantly closer to the bladder (8.6 <i>vs</i> 11.8 cm, p = 0.01) and smaller in length, width, and height. Two ML calculators were developed, one supervised machine learning (SML) and the other unsupervised machine learning (USML), and compared to an existing tool Multi-centre Cohort Study Evaluating the role of Inflammatory Markers In Patients Presenting with Acute Ureteric Colic (MIMIC). The SML calculator included maximum stone width (MSW), ureteral diameter above the stone (UDA), and distance from ureterovesical junction to bottom of stone and had an area under the curve (AUC) of 0.737 upon external validation of 58 "test" patients. Parameters selected by USML included MSW, UDA, and use of an anticholinergic, and it had an AUC of 0.706. The MIMIC calculator's AUC was 0.588 (0.489-0.686). <b><i>Conclusion:</i></b> We used AI to develop calculators that outperformed an existing tool and can help providers and patients make a better-informed decision for the treatment of ureteral stones.
Exploring the efficacy of an artificial intelligence (AI) model derived from the analysis of computed tomography (CT) images to precisely forecast the therapeutic outcomes of singular-session extracorporeal shock wave lithotripsy (ESWL) in the management of ureteral stones.
A total of 317 patients diagnosed clinically with ureteral stones were included in this investigation. Unenhanced CT was administered to the participants within the initial fortnight preceding the inaugural ESWL. The internal cohort consisted of 250 individuals from a local healthcare facility, whereas the external cohort comprised 67 participants from another local medical institution. The proposed framework comprises three main components: an automated semantic segmentation model developed using 3D U-Net, a feature extractor that integrates radiomics and autoencoder techniques, and an ESWL efficacy prediction model trained with various machine learning algorithms. All participants underwent thorough postoperative follow-up examinations 4 weeks hence. The efficacy of ESWL was defined by the absence of stones or residual fragments measuring โค2ย mm in KUB X-ray assessments. Model stability and generalizability were judiciously validated through a fivefold cross-validation approach and a multicenter external test strategy. Moreover, Shapley Additive Explanations (SHAP) values for individual features were computed to elucidate the nuanced contributions of each feature to the model's decision-making process.
The semantic segmentation model the authors constructed exhibited an average Dice coefficient of 0.88ยฑ0.08 on the external testing set. ESWL classifiers built using Support Vector Machine (SVM), Random Forest (RF), XGBoost (XB), and CatBoost (CB) achieved AUROC values of 0.78, 0.84, 0.85, and 0.90, respectively, on the internal validation set. For the external testing set, SVM, RF, XB, and CB predicted ESWL with AUROC values of 0.68, 0.79, 0.80, and 0.83, respectively, with the last one being the optimal algorithm. The radiomics features and auto-encoder features made significant contributions to the decision-making process of the classification model.
This investigation unmistakably underscores the remarkable predictive prowess exhibited by a scrupulously crafted AI model using CT images to precisely anticipate the therapeutic results of a singular session of ESWL for ureteral stones.
To present state of the art on the management of urinary stones from a panel of globally recognized urolithiasis experts who met during the Experts in Stone Disease Congress in Valencia in January 2024. Options of treatment: The surgical treatment modalities of renal and ureteral stones are well defined by the guidelines of international societies, although for some index cases more alternative options are possible. For 1.5 cm renal stones, both m-PCNL and RIRS have proven to be valid treatment alternatives with comparable stone-free rates. The m-PCNL has proven to be more cost effective and requires a shorter operative time, while the RIRS has demonstrated lower morbidity in terms of blood loss and shorter recovery times. SWL has proven to be less effective at least for lower calyceal stones but has the highest safety profile. For a 6mm obstructing stone of the pelviureteric junction (PUJ) stone, SWL should be the first choice for a stone less than 1 cm, due to less invasiveness and lower risk of complications although it has a lower stone free-rate. RIRS has advantages in certain conditions such as anticoagulant treatment, obesity, or body deformity. Technical issues of the surgical procedures for stone removal: In patients receiving antithrombotic therapy, SWL, PCN and open surgery are at elevated risk of hemorrhage or perinephric hematoma. URS, is associated with less morbidity in these cases. An individualized combined evaluation of risks of bleeding and thromboembolism should determine the perioperative thromboprophylactic strategy. Pre-interventional urine culture and antibiotic therapy are mandatory although UTI treatment is becoming more challenging due to increasing resistance to routinely applied antibiotics. The use of an intrarenal urine culture and stone culture is recommended to adapt antibiotic therapy in case of postoperative infectious complications. Measurements of temperature and pressure during RIRS are vital for ensuring patient safety and optimizing surgical outcomes although techniques of measurements and methods for data analysis are still to be refined. Ureteral stents were improved by the development of new biomaterials, new coatings, and new stent designs. Topics of current research are the development of drug eluting and bioresorbable stents. Complications of endoscopic treatment: PCNL is considered the most invasive surgical option. Fever and sepsis were observed in 11 and 0.5% and need for transfusion and embolization for bleeding in 7 and 0.4%. Major complications, as colonic, splenic, liver, gall bladder and bowel injuries are quite rare but are associated with significant morbidity. Ureteroscopy causes less complications, although some of them can be severe. They depend on high pressure in the urinary tract (sepsis or renal bleeding) or application of excessive force to the urinary tract (ureteral avulsion or stricture). Diagnostic work up: ย Genetic testing consents the diagnosis of monogenetic conditions causing stones. It should be carried out in children and in selected adults. In adults, monogenetic diseases can be diagnosed by systematic genetic testing in no more than 4%, when cystinuria, APRT deficiency, and xanthinuria are excluded. A reliable stone analysis by infrared spectroscopy or X-ray diffraction is mandatory and should be associated to examination of the stone under a stereomicroscope. The analysis of digital images of stones by deep convolutional neural networks in dry laboratory or during endoscopic examination could allow the classification of stones based on their color and texture. Scanning electron microscopy (SEM) in association with energy dispersive spectrometry (EDS) is another fundamental research tool for the study of kidney stones. The combination of metagenomic analysis using Next Generation Sequencing (NGS) techniques and the enhanced quantitative urine culture (EQUC) protocol can be used to evaluate the urobiome of renal stone formers. Twenty-four hour urine analysis has a place during patient evaluation together with repeated measurements of urinary pH with a digital pH meter. Urinary supersaturation is the most comprehensive physicochemical risk factor employed in urolithiasis research. Urinary macromolecules can act as both promoters or inhibitors of stone formation depending on the chemical composition of urine in which they are operating. At the moment, there are no clinical applications of macromolecules in stone management or prophylaxis. Patients should be evaluated for the association with systemic pathologies. PROPHYLAXIS: Personalized medicine and public health interventions are complementary to prevent stone recurrence. Personalized medicine addresses a small part of stone patients with a high risk of recurrence and systemic complications requiring specific dietary and pharmacological treatment to prevent stone recurrence and complications of associated systemic diseases. The more numerous subjects who form one or a few stones during their entire lifespan should be treated by modifications of diet and lifestyle. Primary prevention by public health interventions is advisable to reduce prevalence of stones in the general population. Renal stone formers at "high-risk" for recurrence need early diagnosis to start specific treatment. Stone analysis allows the identification of most "high-risk" patients forming non-calcium stones: infection stones (struvite), uric acid and urates, cystine and other rare stones (dihydroxyadenine, xanthine). Patients at "high-risk" forming calcium stones require a more difficult diagnosis by clinical and laboratory evaluation. Particularly, patients with cystinuria and primary hyperoxaluria should be actively searched. FUTURE RESEARCH: Application of Artificial Intelligence are promising for automated identification of ureteral stones on CT imaging, prediction of stone composition and 24-hour urinary risk factors by demographics and clinical parameters, assessment of stone composition by evaluation of endoscopic images and prediction of outcomes of stone treatments. The synergy between urologists, nephrologists, and scientists in basic kidney stone research will enhance the depth and breadth of investigations, leading to a more comprehensive understanding of kidney stone formation.
A kidney stone is a solid formation that can lead to kidney failure, severe pain, and reduced quality of life from urinary system blockages. While medical experts can interpret kidney-ureter-bladder (KUB) X-ray images, specific images pose challenges for human detection, requiring significant analysis time. Consequently, developing a detection system becomes crucial for accurately classifying KUB X-ray images. This article applies a transfer learning (TL) model with a pre-trained VGG16 empowered with explainable artificial intelligence (XAI) to establish a system that takes KUB X-ray images and accurately categorizes them as kidney stones or normal cases. The findings demonstrate that the model achieves a testing accuracy of 97.41% in identifying kidney stones or normal KUB X-rays in the dataset used. VGG16 model delivers highly accurate predictions but lacks fairness and explainability in their decision-making process. This study incorporates the Layer-Wise Relevance Propagation (LRP) technique, an explainable artificial intelligence (XAI) technique, to enhance the transparency and effectiveness of the model to address this concern. The XAI technique, specifically LRP, increases the model's fairness and transparency, facilitating human comprehension of the predictions. Consequently, XAI can play an important role in assisting doctors with the accurate identification of kidney stones, thereby facilitating the execution of effective treatment strategies.
In recent years, machine learning (ML) and deep learning (DL) have been the leading approaches to solving various challenges, such as disease predictions, drug discovery, medical image analysis, etc., in intelligent healthcare applications. Further, given the current progress in the fields of ML and DL, there exists the promising potential for both to provide support in the realm of healthcare. This study offered an exhaustive survey on ML and DL for the healthcare system, concentrating on vital state of the art features, integration benefits, applications, prospects and future guidelines. To conduct the research, we found the most prominent journal and conference databases using distinct keywords to discover scholarly consequences. First, we furnished the most current along with cutting-edge progress in ML-DL-based analysis in smart healthcare in a compendious manner. Next, we integrated the advancement of various services for ML and DL, including ML-healthcare, DL-healthcare, and ML-DL-healthcare. We then offered ML and DL-based applications in the healthcare industry. Eventually, we emphasized the research disputes and recommendations for further studies based on our observations.
| Run at | Source | Hits | New | Status |
|---|---|---|---|---|
| 2026-04-26 00:00 | LitReview | completed | ||
| 2026-04-19 00:00 | LitReview | completed | ||
| 2026-04-14 14:29 | LitReview | completed | ||
| 2026-04-14 14:29 | LitReview | completed | ||
| 2026-04-14 14:28 | LitReview | error | ||
| 2026-04-14 11:57 | LitReview | error | ||
| 2026-04-12 00:00 | LitReview | 1 | 1 | completed |
| 2026-04-05 07:09 | LitReview | 35 | 31 | completed |