LitWatch

Literature Surveillance Dashboard

📖 Docs

🔬Topics

📄Total papers

257

🆕New / 7d

📋Abstracts

92.6%

⏱️Last run

2026-04-26 00:00

🧭 Jump to topic

↓ Radiologist-Specific Report Style Adaptation ↓ Prostate MRI Clinical Trials ↓ Radiology LLM Personalized Reporting ↓ Ureteral Stone Detection on KUB

📊 Overview

🔬 Radiologist-Specific Report Style Adaptation

completed

Papers

New

75.3%

Abstracts

All Sources

Search source

LitReview

Fetcher

📄 Latest report 📚 Full archive

🔍 Search query

(radiology[tiab] OR "radiology report*"[tiab]) AND ("large language model*"[tiab] OR LLM[tiab] OR GPT[tiab]) AND (report*[tiab] OR impression[tiab] OR conclusion[tiab]) AND (style[tiab] OR preference*[tiab] OR personalized[tiab] OR personalization[tiab] OR imitation[tiab] OR adaptation[tiab]) AND (radiologist*[tiab] OR reader*[tiab] OR physician*[tiab] OR "individual radiologist"[tiab] OR "expert feedback"[tiab] OR "human feedback"[tiab] OR fine-tun*[tiab] OR finetun*[tiab] OR LoRA[tiab] OR PEFT[tiab])

📑 Recent papers

Showing latest 10 papers

1ONCO-RADS-guided Large Language Models for Extraction and Classification of Incidental Findings on Whole-Body Imaging Reports.

2026-05Radiology. Imaging cancer⭐ Q1DOI 10.1148/rycan.250484

Purpose To evaluate large language model (LLM)-based strategy performance for extraction and classification of incidental findings from whole-body (WB) imaging reports, particularly strategies incorporating Oncologically Relevant Findings Reporting and Data System (ONCO-RADS). Materials and Methods In this retrospective bicenter study, authors included all WB MRI reports from January 2016 to December 2023 at a referral center (internal dataset). Two observers extracted all incidental findings, and patient records were used to confirm final diagnoses. First, authors evaluated ONCO-RADS performance and the reproducibility of its incidental finding classifications by six radiologists. Then, authors evaluated the accuracy of three LLM-based strategies: (a) a fine-tuned DeBERTa/medical named entity recognition (NER) model; (b) zero-shot LLMs (ChatGPT-o1 [OpenAI], Gemini-2.5-Pro [Google]); and (c) reference-guided prompting of these LLMs using ONCO-RADS. Authors then expanded these strategies to an external dataset of 605 reports with multiple imaging techniques (405 WB MRI; 100 fluorodeoxyglucose PET/CT; and 100 chest-abdomen-pelvis CT acquisitions) from January 2022 to January 2025. Results The internal dataset included 823 patients (mean age, 63.7 years ± 11.7 [SD]; 457 male patients) with 1488 WB MRI reports. The average interobserver reproducibility of ONCO-RADS incidental finding classifications was excellent (Cohen κ, 0.87). The per-report accuracies of ONCO-RADS-guided LLMs (95.6% [151 of 158] and 86.7% [137 of 158] for ChatGPT-o1 and Gemini-2.5-Pro, respectively) were higher than those of the medical NER (69.0% [109 of 158]) and zero-shot LLMs (57.0% [90 of 158] and 70.9% [112 of 158] for ChatGPT-o1 and Gemini-2.5-Pro, respectively) (P < .001). In the external test set (mean age, 60.6 years ± 12.9; 330 male patients), the per-report accuracies of ONCO-RADS-guided ChatGPT-o1 (83.5% [505 of 605]) and Gemini-2.5-Pro (82.0% [496 of 605]) were higher than those of the models without ONCO-RADS prompting (63.1% [382 of 605] and 61.2% [370 of 605], respectively) and the medical NER (55.7% [337 of 605]) (P < .001). Conclusion Reference-guided prompting of the LLMs ChatGPT-o1 and Gemini-2.5-Pro with ONCO-RADS improved their performance in extracting and classifying incidental findings on WB imaging reports compared with zero-shot prompting and medical NER. Keywords: Large Language Models, Incidental Findings, Whole-Body MRI Supplemental material is available for this article. © RSNA, 2026.

🇰🇷 핵심 요약

본 연구는 전신(whole-body) 영상 판독문에서 우발적 소견(incidental finding)을 추출·분류하는 데 있어 ONCO-RADS 기반 참조 유도 프롬프팅(reference-guided prompting)을 적용한 대형 언어 모델(LLM)의 성능을 평가하였다. 내부 데이터셋(WB MRI 보고서 1,488건)과 외부 데이터셋(605건, WB MRI·FDG PET/CT·흉복부 CT 포함)을 대상으로 미세조정 NER 모델, 제로샷 LLM(ChatGPT-o1, Gemini-2.5-Pro), 그리고 ONCO-RADS 참조 유도 LLM의 성능을 비교 분석하였다. ONCO-RADS 유도 ChatGPT-o1은 내부 데이터셋에서 95.6%, 외부 데이터셋에서 83.5%의 보고서별 정확도를 달성하여 제로샷 프롬프팅 및 의료 NER 모델에 비해 유의하게 우수한 성능을 보였으며(P < .001), 표준화된 보고 체계를 LLM 프롬프팅에 통합하는 것이 임상 영상 판독문의 자동화 분석에 효과적임을 시사한다.

Added: 2026-04-21 16:11View ↗

2Automated identification of radiotherapy treatment sites from unstructured physician notes.

2026-04Journal of applied clinical medical physics⭐ Q1DOI 10.1002/acm2.70558

OBJECTIVE

Ambiguous or incomplete documentation is a recurrent bottleneck in radiation oncology workflows, leading to inefficiencies in communication and potential treatment delays. Large language models (LLMs) pose a solution to addressing these ambiguities without added burden to clinical staff. We aim to assess the effectiveness of Meta's open-source Llama 3.3 model in using physician consultation notes to isolate and classify anatomical treatment sites and create helpful extractive summaries for each patient.

METHODS

Semi-structured interviews with five radiation therapists revealed that CT simulation orders lack the necessary details to acquire the appropriate image. A retrospective cohort of 100 patient notes was used for iterative prompt engineering. The final model was evaluated on an independent test cohort of 52 patient notes. The LLM's accuracy in identifying the treatment site was benchmarked against two human observers (a medical physicist and a physician) as well as the final delivered treatment plan (ground truth). The helpfulness and accuracy of the AI-generated summaries were also rated by both observers on a 5-point Likert scale.

RESULTS

Llama 3.3 achieved a weighted accuracy of 94.2% [95%CI: 89.4%-98.1%] when compared to sites isolated by either observer. When compared to the sites isolated from the retrospectively delivered plans, the model reached a weighted accuracy of 92.3% [95% CI: 87.5%-97.1%]. The model classified the anatomical sites with a weighted accuracy of 96.2% [95%CI: 87.0% -98.9%]. The AI-generated summaries were highly rated by both observers (Observer 1: 4.96 [95%CI: 4.87-5.00] and Observer 2: 4.58 [95% CI: 4.38-4.73]).

CONCLUSION

This pilot study provides foundational evidence that LLMs can classify data with high accuracy, achieve benchmarks comparable to human experts when isolating anatomical treatment sites, and produce clinically helpful summaries. Our results suggest that LLMs can be effectively integrated to streamline complex radiotherapy workflows in the clinic.

🇰🇷 핵심 요약

본 연구는 방사선 종양학 워크플로우에서 반복적으로 발생하는 문서 불완전성 문제를 해결하기 위해, Meta의 오픈소스 대형 언어 모델(LLM)인 Llama 3.3을 활용하여 의사 협진 노트로부터 해부학적 치료 부위를 자동으로 식별·분류하고 환자별 요약문을 생성하는 시스템의 성능을 평가하였다. 100명의 후향적 환자 노트로 프롬프트 엔지니어링을 수행한 후 52명의 독립 검증 코호트에 적용한 결과, 치료 부위 식별 정확도는 인간 관찰자 대비 94.2%, 실제 치료 계획 대비 92.3%의 가중 정확도를 달성하였다. AI 생성 요약문에 대한 임상적 유용성 평가에서도 두 관찰자 모두 5점 만점 기준 4.58~4.96의 높은 점수를 부여하여, LLM이 전문가 수준의 정확도로 방사선 치료 워크플로우를 효과적으로 지원할 수 있음을 시사하였다.

Added: 2026-04-21 16:11View ↗

3Comparison of AI-generated radiology impressions: a multi-stakeholder evaluation.

2026-04NPJ digital medicine⭐ Q1DOI 10.1038/s41746-026-02586-6

A retrospective, blinded evaluation of 200 oncologic computed tomography reports compared original radiologist-authored impressions, impressions generated by a custom domain-specific AI model fine-tuned on institutional data, and impressions generated by a general-purpose large language model. Ten clinicians, including original radiologists (n = 4), independent radiologists (n = 3), and oncologists (n = 3), rated impressions for completeness, correctness, conciseness, clarity, clinical utility, and patient harm. Original and independent radiologists assigned lower preference to generic model impressions (Cohen's h 1.04-1.22 and 0.66-0.69, p < 0.001). Original radiologists slightly preferred their own impressions to the custom model (h = 0.18, p = 0.0716), while independent radiologists showed no preference (h = -0.03, p = 0.78). Oncologists demonstrated no significant preference among impression types (h = 0.04-0.12, all p > 0.20). Custom model impressions achieved near parity with human impressions; original radiologists rated their own impressions slightly more complete (r = 0.22, p = 0.0016). Generic model impressions were longer (75.1 ± 20.4 words), slightly more complete (r = 0.18-0.39, p < 0.001-0.01), but significantly less concise (r = 0.85-0.87, p < 0.001). Patient harm ratings were uniformly low (likelihood 1.01-1.14; extent 1.05-1.21). Inter-rater reliability ranged from -0.09 to 0.67 (α = 0.67 conciseness; α = -0.09-0.03 clinical utility/correctness).

🇰🇷 핵심 요약

본 연구는 종양학적 복부 전산화단층촬영(CT) 판독문 200건을 대상으로, 영상의학과 전문의가 직접 작성한 인상(impression)과 기관 데이터로 미세조정된 도메인 특화 AI 모델 및 범용 대형언어모델이 생성한 인상을 다직군 평가자(영상의학과 의사 7명, 종양내과 의사 3명)가 맹검 방식으로 비교·평가하였다. 도메인 특화 모델의 인상은 전문의 작성 인상과 거의 동등한 수준을 달성한 반면, 범용 모델의 인상은 완결성은 다소 높았으나 간결성이 현저히 낮아 영상의학과 의사들로부터 유의하게 낮은 선호도를 받았다(Cohen's h 0.66–1.22, p < 0.001). 종양내과 의사들은 인상 유형 간 유의미한 선호 차이를 보이지 않았으며, 모든 군에서 환자 위해 가능성 평점은 일관되게 낮아 임상적 안전성은 양호한 것으로 나타났다.

Added: 2026-04-21 16:11View ↗

4Context-Aware Sentence Classification of Radiology Reports Using Synthetic Data: Development and Validation Study.

2026-04Journal of medical Internet research⭐ Q1DOI 10.2196/86365

BACKGROUND

Automated structuring of radiology reports is essential for data utilization and the development of medical artificial intelligence models. However, manual annotation by experts is labor-intensive, and processing real clinical data through commercial large language models (LLMs) presents significant privacy risks. These challenges are particularly pronounced for non-English languages like Japanese, where specialized medical corpora are scarce. While synthetic data generation offers a potential privacy-preserving alternative, its effectiveness in capturing complex clinical nuances-such as negation and contextual dependencies-to train robust classification models without any real-world training data has not been fully established.

OBJECTIVE

This study aimed to develop a context-aware sentence classification model for Japanese radiology reports using an entirely synthetic training pipeline, thereby eliminating reliance on real-world clinical data during the development phase. Furthermore, we sought to evaluate the generalizability of this approach by validating the model's performance on diverse, multi-institutional, real-world reports.

METHODS

Japanese radiology reports (n=3104) were generated using GPT-4.1 and automatically annotated at the sentence level into 4 categories (background, positive finding, negative finding, and continuation) using GPT-4.1-mini. The synthetic data were partitioned into training (n=2670), validation (n=334), and test (n=100) sets. We fine-tuned several models, including lightweight local LLMs (Qwen3 and Llama 3.2 series) using low-rank adaptation and Japanese text classification models (Bidirectional Encoder Representations from Transformers [BERT]-base Japanese v3, Japanese Medical Robustly Optimized BERT Pretraining Approach [JMedRoBERTa]-base, and ModernBERT-Ja-130M). External validation was performed using 280 real-world reports (3477 sentences) from 7 institutions in the Japan Medical Image Database, with ground-truth labels established by board-certified radiologists. Evaluation metrics included accuracy, macro-averaged F1 (macro F1) score, and positive predictive value for positive findings (PPV_1).

RESULTS

All models achieved high performance on the synthetic test set (accuracy: 0.938-0.951; macro F1-score: 0.924-0.940). Overall performance declined on the external validation dataset (accuracy: 0.783-0.813; macro F1-score: 0.761-0.790), reflecting distributional differences between synthetic and real-world reports; however, PPV_1 remained stable and high across datasets (eg, 0.957 on the synthetic test set vs 0.952 on the external validation dataset for Qwen3 [4B]). Parsing errors occurred in LLM-based approaches (19-260 sentences, 0.55%-7.48% in the external dataset).

CONCLUSION

This study demonstrates the feasibility of developing context-aware sentence classification models for Japanese radiology reports using a training pipeline based entirely on synthetic data. The stability of PPV_1 indicates that the models successfully captured the essential clinical terminology and linguistic patterns required to identify positive findings in real-world reports, despite the observed performance degradation during external validation. This approach substantially reduces manual annotation requirements and privacy risks, providing a scalable foundation for constructing structured radiology datasets to support the development of clinically relevant medical artificial intelligence models.

🇰🇷 핵심 요약

본 연구는 개인정보 보호 문제와 수작업 주석의 부담을 줄이기 위해, 실제 임상 데이터 없이 GPT-4.1로 생성한 합성 일본어 방사선 보고서(3,104건)만을 활용하여 문장 수준의 맥락 인식 분류 모델을 개발하였다. 경량 LLM(Qwen3, Llama 3.2 계열) 및 BERT 기반 일본어 의료 모델을 미세조정한 후, 7개 기관의 실제 방사선 보고서 280건(3,477문장)으로 외부 검증을 수행하였다. 합성 테스트셋에서는 높은 성능(정확도 0.938–0.951, macro F1 0.924–0.940)이 확인되었으며, 외부 검증에서 전반적 성능은 다소 저하되었으나 양성 소견에 대한 양성 예측도(PPV)는 0.952로 안정적으로 유지되어, 합성 데이터 기반 파이프라인이 임상적으로 활용 가능한 방사선 보고서 구조화 모델 개발에 실용적인 대안임을 입증하였다.

Added: 2026-04-21 16:11View ↗

5Improving patient understanding of oncology imaging: radiologist and patient evaluation of summarised versus full-length AI-simplified reports from a tertiary cancer centre.

2026-04Cancer imaging : the official publication of the International Cancer Imaging Society⭐ Q1DOI 10.1186/s40644-026-01031-x

Added: 2026-04-21 16:11View ↗

6Initial Insights Into an Institutional Secure Large Language Model for Magnetic Resonance Imaging Examination Requests: Retrospective Study.

2026-04Journal of medical Internet research⭐ Q1DOI 10.2196/82579

BACKGROUND

Incomplete clinical details on magnetic resonance imaging (MRI) examination requests (MERs) can lead to suboptimal protocol selection. An institutional secure large language model (sLLM) with access to manually retrieved salient data from the electronic medical record (EMR) may improve request completeness and protocol accuracy across multiple MRI subspecialties.

OBJECTIVE

The objective of this study was to compare clinician MERs with sLLM-augmented MERs for information quality and to evaluate the protocoling accuracy of the sLLM versus board-certified radiologists across body, musculoskeletal, and neuroradiology MRI.

METHODS

This retrospective study included 608 random outpatient MRI examinations performed between September 2023 and July 2024 (body 206, musculoskeletal 203, neuroradiology 199). The cohort comprised 528 patients (mean 51.2 years, SD 19.2; range 4-93; n=279, 52.8% women, n=249, 47.2% men). MERs without EMR access were excluded. A privately hosted Anthropic Claude 3.5 model (temperature 0) augmented each MER with manually retrieved salient EMR data and, via rule-based parsing, mapped the extracted elements onto predefined institutional criteria to recommend region or coverage and contrast use. Two experienced radiologists established a consensus reference standard. Two board-certified general radiologists (Rad 3 and Rad 4) and the sLLM were compared with this standard. Clinical information quality was graded using the Reason-for-Exam Imaging Reporting and Data System (RI-RADS). Interrater reliability was quantified with Gwet AC1. Paired accuracies were compared with the McNemar test to determine whether there was a statistically significant difference.

RESULTS

Interreader agreement for RI-RADS was almost perfect for sLLM-augmented MERs (AC1 0.97, 95% CI 0.94-0.99) and moderate for clinician MERs (AC1 0.43, 95% CI 0.34-0.52). Limited or deficient clinical information (RI-RADS C/D) fell to 0% to 0.7% (0/608 to 4/608) with sLLM augmentation vs 4.1% to 20.4% (25/608 to 124/608) for clinician MERs. Overall protocol accuracy was 93.1% (566/608; 95% CI 89.6-96.6) for the sLLM, 91.4% (556/608; 95% CI 87.6-95.3) for Rad 3, and 92.1% (560/608; 95% CI 88.4-95.8) for Rad 4 (sLLM vs Rad 3 P=.23 vs Rad 4 P=.40). Region or coverage accuracy was similar (sLLM: 579/608, 95.2%; Rad 3: 585/608, 96.2%; Rad 4: 573/608, 94.2%; P=.46 and P=.36). Contrast decisions were more accurate using the sLLM at 94.4% (574/608; 95% CI 91.3-97.5) vs Rad 3 at 92.1% (560/608; 95% CI 88.4-95.8; P=.027) and were not significantly different to Rad 4 at 92.9% (565/608; 95% CI 89.4-96.4; P=.16). Subspecialty analyses showed similar patterns, with the sLLM outperforming Rad 4 for musculoskeletal MRI contrast decisions (96.6% vs 91.1%; P=.006) and matching readers elsewhere. Manual review indicated that sLLM improvements arose from EMR details not listed on the MER (infection/inflammation, tumor history, prior surgery). No clinically significant hallucinations were identified in a manual review of discordant cases.

CONCLUSION

Across body, musculoskeletal, and neuroradiology MRI, sLLM-augmented examination requests improved clinical context and enhanced contrast selection while demonstrating accuracy comparable to general radiologists for region or coverage. Integrating sLLMs into routine vetting workflows may reduce manual workload in protocol selection for more efficient, standardized protocoling.

🇰🇷 핵심 요약

본 후향적 연구는 전자의무기록(EMR) 데이터로 증강된 기관 내 보안 대규모 언어 모델(sLLM, Claude 3.5)이 MRI 검사 의뢰서의 임상 정보 질과 프로토콜 선택 정확도를 향상시킬 수 있는지 평가하기 위해, 체부·근골격·신경방사선 분야 외래 MRI 608건을 대상으로 sLLM과 전문의 2명의 판독을 비교하였다. sLLM 증강 후 불충분한 임상 정보(RI-RADS C/D) 비율이 임상의 의뢰서의 4.1~20.4%에서 0~0.7%로 대폭 감소하였으며, 전체 프로토콜 정확도는 sLLM 93.1%, 방사선과 전문의 91.4~92.1%로 통계적으로 유의한 차이가 없었다. 특히 조영제 사용 결정에서 sLLM(94.4%)이 전문의 1인(92.1%, P=.027)보다 유의하게 우수하여, sLLM을 MRI 프로토콜 검토 워크플로우에 통합하면 수작업 부담을 줄이고 표준화된 프로토콜 선택에 기여할 수 있음을 시사한다.

Added: 2026-04-21 16:11View ↗

7Multidimensional evaluation of large language models in radiology report readability.

2026-04NPJ digital medicine⭐ Q1DOI 10.1038/s41746-026-02589-3

This study systematically investigated the influence of demographic characteristics on the readability of patient-centric radiology reports and compared the performance of different large language models (LLMs) in generating patient-centered reports. Adopting a sequential two-stage design, the research first conducted a retrospective evaluation involving 320 radiology reports followed by a clinical setting validation with 800 patients. Results suggested that all three LLMs significantly improved the readability of radiology reports (P < 0.05), with DeepSeek-R1 showing potentially superior performance within this specific cohort. Demographic analysis revealed significant interactive effects: higher education and older age (within consistent educational levels) were associated with better comprehension. Clinical setting validation further indicated that reading simplified reports suggesting the potential to significantly improved patients' subjective and objective comprehension while significantly alleviating medical anxiety (P < 0.05). However, limitations persist, including inconsistent model outputs, missing anatomical details, and comprehension variances driven by demographic factors. Consequently, LLMs should be integrated as auxiliary communication tools for radiologists rather than standalone solutions, necessitating personalized interventions tailored to specific demographic profiles.

🇰🇷 핵심 요약

본 연구는 대규모 언어 모델(LLM)이 환자 중심 방사선 판독문의 가독성 향상에 미치는 영향을 체계적으로 평가하기 위해, 320건의 후향적 판독문 평가와 800명 환자를 대상으로 한 임상 검증의 2단계 순차 설계를 채택하였다. 세 가지 LLM 모두 방사선 판독문의 가독성을 유의하게 개선하였으며(P < 0.05), 특히 DeepSeek-R1이 가장 우수한 성능을 보였고, 고학력 및 동일 교육 수준 내 고령 환자에서 더 높은 이해도가 관찰되는 인구통계학적 상호작용 효과가 확인되었다. 단순화된 판독문은 환자의 주관적·객관적 이해도를 유의하게 향상시키고 의료 불안을 경감시켰으나, 모델 출력의 비일관성, 해부학적 정보 누락, 인구통계학적 요인에 따른 이해도 차이 등의 한계가 존재하여 LLM은 방사선과 의사의 독립적 대체 수단이 아닌 보조적 환자 소통 도구로 활용되어야 하며 개인 맞춤형 접근이 필요함을 시사한다.

Added: 2026-04-21 16:11View ↗

8Read like a radiologist: Efficient vision-language model for 3D medical imaging interpretation.

2026-04Medical image analysis⭐ Q1DOI 10.1016/j.media.2026.104077

Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM leverages self-supervised 2D transformer encoders to learn a volumetric representation that capture inter-slice dependencies from a sequence of slice-specific features. Unbound by sub-volumetric patchification, MS-VLM is capable of obtaining useful volumetric representations from 3D medical images with any slice length and from multiple images acquired from different planes and phases. We evaluate MS-VLM on publicly available chest CT dataset CT-RATE and in-house rectal MRI dataset. In both scenarios, MS-VLM surpasses existing methods in radiology report generation, producing more coherent and clinically relevant reports. These findings highlight the potential of MS-VLM to advance 3D medical image interpretation and improve the robustness of medical VLMs.

🇰🇷 핵심 요약

본 연구는 기존 3D 의료 영상 특화 시각-언어 모델(VLM)이 z축 방향으로 과도하게 상관된 표현을 학습하여 슬라이스별 임상 세부 정보를 간과한다는 한계를 극복하고자, 영상의학과 전문의의 판독 방식을 모방한 MS-VLM을 제안하였다. MS-VLM은 자기지도학습 기반 2D 트랜스포머 인코더를 활용하여 슬라이스별 특징을 순차적으로 처리하고 슬라이스 간 의존성을 포착함으로써, 다양한 슬라이스 길이 및 다중 촬영 평면·위상 영상에도 유연하게 적용 가능한 체적 표현을 학습한다. 공개 흉부 CT 데이터셋(CT-RATE) 및 직장 MRI 데이터셋을 대상으로 한 평가에서 MS-VLM은 기존 방법들을 능가하는 일관성 있고 임상적으로 유의미한 영상의학 보고서를 생성하였다.

Added: 2026-04-21 16:11View ↗

9Real-world text-only inference of PI-RADS v2.1 from prostate MRI reports using large language models: a lesion-level, zone-aware study.

2026-04European journal of radiology⭐ Q1DOI 10.1016/j.ejrad.2026.112838

OBJECTIVE

To evaluate the feasibility and limitations of real-world, text-only inference of PI-RADS v2.1 categories from prostate MRI reports using large language models, with lesion-level and zone-aware analysis.

METHODS

This single-center retrospective study included 1,205 lesion-level entries from 1,118 patients derived from semi-structured prostate MRI reports after removal of all explicit PI-RADS elements. ChatGPT-4o was prompted to assign numeric PI-RADS categories based solely on report text. Agreement with radiologist-assigned reference categories was assessed using exact agreement, Cohen's κ, and class-wise metrics. Analyses were performed overall, by zone (peripheral vs transition), and using collapsed risk strata (1-2/3/4-5). Discordant cases were reviewed to identify error mechanisms and severity. Human interobserver agreement, intra-model reproducibility, temporal stability, and a paired model-version sensitivity analysis comparing ChatGPT-4o with GPT-5.2 were also evaluated.

RESULTS

Overall exact agreement was 72.9% (κ = 0.538; macro-F1 = 61.2%), with a systematic tendency toward overcalling. Agreement was higher in the peripheral zone than in the transition zone (κ = 0.476 vs 0.077, reference PI-RADS 3-5). PI-RADS 3 showed the lowest precision and recall, with frequent bidirectional misclassification. Collapsing categories improved agreement (κ = 0.610). Incorrect diffusion-weighted imaging subscores were the most common error mechanism, with zone-specific differences. Clinically high-impact downgrades of PI-RADS 4-5 to 1-2 were rare (1.6%). Human interobserver agreement was excellent (κ = 0.916-0.967). GPT-5.2 outperformed ChatGPT-4o in paired analyses but produced invalid outputs in a minority of cases.

CONCLUSION

Text-only large language models can infer radiologist-assigned PI-RADS v2.1 categories from real-world prostate MRI reports with moderate agreement, but performance is zone dependent and limited around PI-RADS 3, particularly in the transition zone. These models are best suited as supervised tools for quality control rather than autonomous decision-making.

🇰🇷 핵심 요약

본 연구는 대규모 언어 모델(LLM)이 전립선 MRI 보고서 텍스트만을 이용하여 PI-RADS v2.1 범주를 자동으로 추론하는 것의 가능성과 한계를 평가하였다. 1,118명 환자의 1,205개 병변 데이터를 기반으로 ChatGPT-4o에 명시적 PI-RADS 항목을 제거한 반구조화 보고서를 입력하여 방사선과 의사 판독 결과와의 일치도를 병변 수준 및 구역별로 분석하였다. 전체 정확 일치율은 72.9%(κ=0.538)로 중등도 수준이었으며, 말초 구역에서의 성능(κ=0.476)이 이행 구역(κ=0.077)보다 현저히 우수하였고 PI-RADS 3 범주에서 가장 낮은 정확도를 보여, 이 모델은 자율적 판단보다는 감독 하 품질 관리 도구로 활용하는 것이 적합하다고 결론지었다.

Added: 2026-04-21 16:11View ↗

10Towards Automated FIGO Staging in Radiology: The Role of LLMs in Cervical and Endometrial Cancer.

2026-04Academic radiology⭐ Q1DOI 10.1016/j.acra.2026.01.024

RATIONALE AND

OBJECTIVE

Staging gynecological malignancies is a complex process, and radiologists should be familiar with the evolution of FIGO staging criteria. Large Language Models (LLMs) offer potential to support radiologists by automating classification tasks from free-text MRI reports.

METHODS

We conducted a retrospective study using two curated datasets of pelvic MRI reports from patients with cervical (n = 261, FIGO 2018) and endometrial cancer (n = 555, FIGO 2023). A general-purpose LLM (Cohere Command-A) was evaluated under three prompting strategies (zero-shot, guided, and chain-of-thought [CoT]), using exact stage accuracy, an ordinal FIGO distance metric, and the rate of severe errors. The Cohere Command-A model was chosen for its long-context reasoning, instruction-following capabilities, reproducible fixed version, and secure handling of sensitive clinical data. While alternative LLMs (eg, GPT-4o, Gemini, Llama-3, DeepSeek) could offer complementary insights, access, resources, and compliance constraints limited broader comparisons.

RESULTS

For cervical cancer, CoT prompting achieved the highest accuracy (80.5%) and the lowest FIGO distance, with 23 severe misclassifications (≥2-stage deviation), outperforming guided and zero-shot prompting. For endometrial cancer, all strategies performed appropriately, with CoT again yielding the best results (accuracy, 90.6%) and the lowest number of severe misclassifications (37 cases), compared with guided and zero-shot prompting. In a small subset of cases with no agreement between any prompting strategy and the reference label, manual review showed that only a minority presented potentially suboptimal annotations, suggesting that CoT-based predictions may also help flag doubtful reports.

CONCLUSION

The LLMs used demonstrated strong performance in automatically assigning FIGO stages for cervical and endometrial cancers from MRI reports. Their integration could reduce workload and improve consistency in staging. Further validation is needed before clinical implementation.

🇰🇷 핵심 요약

본 연구는 자궁경부암 및 자궁내막암 환자의 골반 MRI 보고서에서 대규모 언어 모델(LLM)을 활용하여 FIGO 병기 분류를 자동화할 수 있는지 평가하였다. 자궁경부암 261례(FIGO 2018)와 자궁내막암 555례(FIGO 2023)의 MRI 보고서를 대상으로 Cohere Command-A 모델에 제로샷, 가이드, 사고 연쇄(Chain-of-Thought, CoT) 세 가지 프롬프팅 전략을 적용하여 병기 정확도와 중증 오분류율을 비교하였다. CoT 프롬프팅이 자궁경부암에서 80.5%, 자궁내막암에서 90.6%의 최고 정확도를 달성하며 가장 우수한 성능을 보였고, 이는 LLM이 방사선과 보고서 기반 FIGO 병기 자동화에 활용 가능성이 높음을 시사하나 임상 적용 전 추가 검증이 필요하다.

Added: 2026-04-21 16:11View ↗

🔄 Run history

Run at	Source	Hits	New	Status
2026-04-26 00:00	LitReview	1		completed
2026-04-21 16:11	litreview:seed	77	77	seed-completed

↑ Back to top

📊 Overview

🔬 Prostate MRI Clinical Trials

completed

101

Papers

New

100.0%

Abstracts

PubMed

Search source

LitReview

Fetcher

📄 Latest report 📚 Full archive

🔍 Search query

"prostatic neoplasms"[mesh] AND "magnetic resonance imaging"[mesh] AND "clinical trial"[pt]

📑 Recent papers

Showing latest 10 papers

1Comprehensive Evaluation of Targeted and Perilesional Biopsy in Biopsy-Naïve Patients With Prostate Positive Magnetic Resonance Imaging: PERI-PRO Noninferiority Randomized Controlled Trial.

2026-04The Journal of urology⭐ Q1DOI 10.1097/ju.0000000000004863

OBJECTIVE

The combined targeted and systematic biopsy (CTSBx) was the standard scheme for patients with visible suspicious lesions on MRI in recent years. 2024 European Association of Urology guideline recommended targeted and perilesional biopsy (TPLBx) for the diagnosis of patients with MRI-visible suspicious lesions. This randomized controlled trial aims to comprehensively evaluate the efficacy and safety profiles of TPLBx and CTSBx schemes.

METHODS

A single-center noninferiority randomized controlled trial consecutively enrolled 380 biopsy-naïve patients (CTSBx: n = 190, TPLBx: n = 190) with a single unilateral suspicious lesion on prostate MRI from June 2024 to November 2024. The noninferiority margin was -15%. All biopsies were undertaken transrectally through the cognitive fusion technique. The primary outcome was Grade Group (GG) ≥ 2 cancer (GG ≥ 2-PCa) detection rate.

RESULTS

The GG ≥ 2-PCa (58% vs 58%, risk difference [RD]: 0.53% [95% CI: -9.4% to 11%]) and GG ≥ 3-PCa (30% vs 30%, RD: 0.53% [95% CI: -8.7% to 9.7%]) detection rates of TPLBx were noninferior to that of CTSBx (P < .001). There was no significant difference in PCa and GG1-PCa detection rates between the 2 groups (P > .050). The complication rate of TPLBx was significantly lower than that of CTSBx group (Clavien-Dindo scale ≥ 1: 62% vs 74%, P = .023), especially for bleeding-related complications (rectal bleeding: 34% vs 48%, P = .003; hematuria, 39% vs 56%, P < .001) and rectal pain (25% vs 34%, P = .018). TPLBx could significantly shorten the procedure time and saved the pathological cost (P < .001).

CONCLUSION

For patients with a single unilateral suspicious lesion on prostate MRI, TPLBx achieved the noninferior diagnostic efficacy of clinically significant PCa and better safety than the CTSBx scheme. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT06482658.

🇰🇷 핵심 요약

본 연구는 전립선 MRI에서 단일 일측성 의심 병변이 있는 생검 전 환자를 대상으로 표적 및 주변부 생검(TPLBx)과 기존 표적 및 체계적 생검(CTSBx)의 진단 효능 및 안전성을 비교하기 위해 2024년 6월부터 11월까지 단일 중심에서 380명(각 군 190명)을 무작위 배정한 비열등성 무작위 대조 시험을 수행했으며, 주요 평가지표는 Grade Group ≥2 전립선암 검출율로 설정하였다. 그 결과 TPLBx는 CTSBx와 비교했을 때 Grade Group ≥2 및 ≥3 전립선암 검출율에서 비열등성을 입증했으며(검출율 차이 0.53%, 95% CI -9.4%~11%), 합병증 발생률이 유의하게 낮고(Clavien‑Dindo ≥1: 62% 대 74%, P=.023) 출혈·혈뇨·직장통증이 감소했으며, 시술 시간이 단축되고 병리 비용이 절감되는 등 안전성 및 효율성에서 우수함을 보였다.

Added: 2026-04-05 16:08View ↗

2Four-Year Feasibility and Safety Results of a Phase 1/2 Single-Arm Prospective Clinical Trial of Stereotactic Magnetic Resonance-Guided Adaptive Radiation Therapy for Metachronous Oligometastatic Abdominopelvic Lymph Node and Soft Tissue Metastases.

2026-04International journal of radiation oncology, biology, physics⭐ Q1DOI 10.1016/j.ijrobp.2026.01.002

OBJECTIVE

Metachronous oligometastases may represent a favorable disease state for local therapy after prior curative treatment. Stereotactic Magnetic Resonance-Guided Adaptive Radiation Therapy (SMART) provides precise targeting of nodal and soft tissue metastases. The primary objective was to assess the feasibility and safety of SMART for abdominopelvic metachronous oligometastases. Secondary objectives included assessing rates of toxicities and evaluating local control (LC). METHODS AND MATERIALS: Ten patients were enrolled with solid tumor metachronous abdominopelvic nodal or soft tissue metastases, ≤7 cm in maximal diameter, and ≤3 sites of active disease. All patients received 40 Gy in 5 fractions. Acute toxicities were graded per Common Terminology Criteria for Adverse Events v5 per-protocol follow-up over 1 year. Late toxicities and clinical outcomes were elucidated by chart review. LC, distant progression-free survival, and overall survival were analyzed using the Kaplan-Meier Method.

RESULTS

Eight patients with prostate cancer and 2 with renal cell carcinoma were enrolled in the study. All patients were successfully treated with SMART per-protocol without complications. The median follow-up after SMART was 4.22 years. Three patients experienced acute grade 1 toxicities; there were no higher grade or late toxicities. Among these 10 patients, 4-year LC and overall survival were both 90%, and 4-year distant progression-free survival was 20%. Two patients (1 prostate cancer, 1 renal cell carcinoma) remain with no evidence of disease, each at over 4 years following SMART and without receiving further systemic or local therapies.

CONCLUSION

With 4 years median follow-up, this small prospective trial reports low toxicity, supporting the feasibility of SMART metastasis-directed therapy for metachronous oligometastases with minimal risk of acute or late toxicity.

🇰🇷 핵심 요약

본 연구는 복부·골반 부위의 전이성 림프절 및 연부 조직에 대한 메타크로너스 올리고메타스타시스를 대상으로, 40 Gy를 5회 분할로 투여하는 스테레오틱 자기공명 가이드 적응 방사선 치료(SMART)의 시행 가능성과 안전성을 10명의 고형암 환자를 전향적으로 단일군 임상시험 설계로 평가하였다. 연구 결과, 모든 환자가 프로토콜대로 치료를 완료했으며 급성 1등급 독성만 3명에서 발생하고 4년 전체 생존률 및 국소조절률이 각각 90%에 달하고 중증 급성·지연 독성은 없었음으로, 장기 추적(중위 4.22년)에서도 낮은 독성 프로파일과 우수한 국소조절 효과가 확인되어 전이성 희소 전이에 대한 SMART의 적용 가능성이 입증되었다.

Added: 2026-04-05 16:08View ↗

3Artificial Intelligence 3D Augmented Reality-guided Robotic Prostatectomy Versus Cognitive MRI Intervention: Results of the Prospective Randomized RIDERS Trial.

2026-03European urology⭐ Q1DOI 10.1016/j.eururo.2025.09.4172

BACKGROUND AND

OBJECTIVE

Three-dimensional (3D) augmented reality (AR) and artificial intelligence (AI) technologies have recently been introduced to enhance guidance during robot-assisted radical prostatectomy (RARP). By overlaying virtual and real-time images, this approach helps accurately localize hidden lesions during surgery, enabling the execution of tailored procedures. This study aimed to evaluate whether 3D-AI-AR guidance reduces positive surgical margins (PSMs) compared with standard tw0-dimensional (2D) magnetic resonance imaging (MRI)-based interventions.

METHODS

In this prospective, multicenter randomized controlled trial (NCT06318559), 133 patients with extracapsular extension or bulging at preoperative MRI were enrolled and randomized (2:1) to either 2D MRI-guided (n = 84) or 3D-AI-AR-guided RARP (n = 49). All the patients underwent nerve-sparing RARP. Intraoperative selective biopsies were then performed at the level of the preserved neurovascular bundle (NVB): cognitive in the MRI group and AR guided in the 3D group. The primary outcomes included PSM rate. Prostate-specific antigen (PSA) levels, continence, and potency recovery were assessed during the 12 mo of follow-up. The use of postoperative radiotherapy was recorded. Biochemical recurrence (BCR) was defined as PSA >0.4 ng/ml. All the analyses were conducted with SAS Statistics Software v.9.4. KEY FINDINGS AND LIMITATIONS: Baseline and intraoperative characteristics were similar between the groups. While PSMs on prostate surface were comparable (p = 0.8), 3D-guided excisional biopsies had a significantly higher positivity rate (52% vs 13%; p = 0.001), allowing an improved margin control. The 3D group had a lower overall PSM rate (22% vs 39%; p = 0.047), required less postoperative RT (18% vs 35%; p = 0.046), and showed higher continence at 12 mo (91% vs 71%; p = 0.03). Potency and BCR rates were similar. CONCLUSIONS AND CLINICAL

CONCLUSION

The execution of a 3D-AI-AR-guided biopsy at the level of preserved NVBs during nerve-sparing RARP allows correct identification of the tumor with subsequent improvement of margin control. Longer follow-up is required to assess the functional and long-term oncological outcomes of this approach.

🇰🇷 핵심 요약

본 연구는 전립선암 환자 133명을 2:1 비율로 2차원 MRI 기반 가이드군과 3차원 인공지능 증강현실(AI‑AR) 가이드군에 무작위 배정하고, 신경보존 로봇보조 전립선 절제술 중 보존된 신경혈관다발(NVB) 수준에서 선택적 조직검사를 시행하여 양성 절제 경계(PMS) 감소 효과를 평가하였다. 그 결과, 3D‑AI‑AR 가이드군은 전체 PMS 비율이 22%로 2D MRI 가이드군의 39%보다 유의하게 낮았으며, 수술 후 방사선 치료 필요성이 감소하고 12개월 시점에서 continence 비율이 91%로 71%에 비해 유의하게 높았으나, potency와 생화학적 재발률에는 차이가 없었으며, AI‑AR 기반 바이옵시가 절제 경계 개선에 기여함을 시사한다.

Added: 2026-04-05 16:08View ↗

4Primary Noncontrast Magnetic Resonance Imaging for Prostate Cancer Screening: A Randomized Clinical Trial (PROSA).

2026-03European urology⭐ Q1DOI 10.1016/j.eururo.2025.11.024

BACKGROUND AND

OBJECTIVE

Prostate-specific antigen (PSA)-based screening for prostate cancer (PCa) has limited accuracy, and it is linked to overdiagnosis. The PROSA trial aimed to evaluate whether a contrast-free biparametric magnetic resonance imaging (bpMRI)-first screening strategy improves the detection of clinically significant PCa (csPCa) as the primary outcome. The secondary outcomes included overall PCa detection, benefit-harm metrics, and cost effectiveness from a health care payer perspective.

METHODS

This single-center, randomized controlled trial enrolled 816 asymptomatic men aged 49-69 yr (≥40 yr with a PCa family history). Participants were randomized into two arms: arm A underwent bpMRI regardless of the PSA levels; arm B received bpMRI only if PSA ≥3 ng/ml (or 2.5 ng/ml with a family history). Men with Prostate Imaging Reporting and Data System score ≥3 were directed to a targeted biopsy. Imaging and pathology assessors were blinded; csPCa is defined as International Society of Urological Pathology grade group ≥2. The primary outcomes included csPCa detection, benefit-harm metrics, and cost effectiveness from a health care payer perspective. KEY FINDINGS AND LIMITATIONS: Among 759 randomized men, biopsy and csPCa detection rates were higher in arm A (10.8% and 4.6%, respectively) than in arm B (5.2% and 1.8%, respectively), with a relative risk of 2.6 (95% confidence interval 1.1-6.1; p = 0.05) for the csPCa detection rate. Benefit-harm metrics favored the MRI-first strategy, showing higher grade selectivity (1.89 vs 1.75), biopsy efficiency (0.74 vs 0.54), and biopsy avoidance (23.1 vs 11.9). No serious adverse event was recorded. The MRI-first strategy yielded an incremental cost-effectiveness ratio of €2201.75 per csPCa case detected. Limitations include single-round design and short follow-up. CONCLUSIONS AND CLINICAL

CONCLUSION

In this randomized screening trial, a contrast-free MRI-first pathway improved csPCa detection, enhanced benefit-harm metrics, and showed favorable cost effectiveness.

🇰🇷 핵심 요약

PROSA 연구는 전립선암 조기 검출을 위해 PSA 기반 선별 대신 비조영제 양방향 MRI(bpMRI)를 우선 적용하는 전략이 임상적으로 유의미한 전립선암(csPCa) 검출에 미치는 영향을 평가하였다. 49–69세 남성 759명을 무작위 배정한 결과, MRI‑우선군은 PSA‑조건군에 비해 csPCa 검출율이 4.6%에서 1.8%로 상승했으며 위험비는 2.6(95% CI 1.1–6.1)로 통계적으로 유의하였다.

Added: 2026-04-05 16:08View ↗

5Two Versus Five-Fraction Magnetic Resonance-Guided Adaptive Radiotherapy with DOminant-TArgeted Boost in Localized Prostate Cancer (DOTA-2): Interim Acute Toxicity Analysis of the Phase II Randomised Trial.

2026-03Clinical oncology (Royal College of Radiologists (Great Britain))⭐ Q1DOI 10.1016/j.clon.2025.104029

OBJECTIVE

DOminant-TArgeted Boost in Localized Prostate Cancer (DOTA-2) is a phase II randomised controlled trial comparing two ultra-hypofractionated radiotherapy with dominant intraprostatic lesion (DIL) boost: 26 Gy/2F, 32 Gy to DIL vs 36.25 Gy/5F, 40 Gy to DIL, without androgen deprivation therapy (ADT), for prostate cancer.

METHODS

Patients with low- to favourable-intermediate-risk prostate cancer were randomly assigned to receive either 2 fractions or 5 fractions. Magnetic resonance-guided adaptive radiotherapy (MRgART) was delivered using the Unity® MR-Linac with the adapt-to-shape workflow for every fraction. The primary endpoint was cumulative grade ≥2 acute genitourinary (GU) and gastrointestinal (GI) toxicity. Secondary endpoints included quality of life in the urinary and sexual domains. An interim analysis of acute GU and GI toxicities was conducted on the first 22 patients from the total planned cohort of 44.

RESULTS

Patients were randomly assigned to either the 2-fraction (N = 10) or 5-fraction stereotactic body radiotherapy (SBRT) (N = 12), stratified by risk group, prostate volume, and DIL location. The median follow-up time was 16 weeks. The cumulative worst acute grade ≥2 GU toxicity was reported in 2/10 (20%) patients in the 2-fraction group vs 4/12 (33.3%) in the 5-fraction group (P = 0.48), with no cases of grade ≥3 acute GU toxicity. No grade ≥2 acute GI toxicity was observed in either arm. The two groups had no significant difference in International Prostate Symptom Score (IPSS) and International Index of Erectile Function (IIEF-5) scores.

CONCLUSION

Two-fraction SBRT with a DIL boost, delivered using MRgART without ADT, demonstrated acceptable acute GU and GI toxicity in this interim analysis, suggesting the feasibility of continuing the investigation.

🇰🇷 핵심 요약

본 연구는 안드로겐 억제 치료 없이 전립선암 환자를 대상으로 2회(26 Gy/2F, DIL 32 Gy)와 5회(36.25 Gy/5F, DIL 40 Gy) 초고선량 방사선 치료를 비교하기 위해 Unity® MR‑Linac 기반 적응형 MR‑가이드 방사선 치료(MRgART)와 adapt‑to‑shape 워크플로우를 적용한 무작위 배정 2군 설계로, 급성 요로·위장 독성(GU·GI) 2등급 이상 발생률을 1차 평가 목표로 설정하였다. 중간 분석에서 2회군은 20%(2/10), 5회군은 33.3%(4/12)로 급성 2등급 이상 GU 독성 발생률에 유의한 차이가 없었으며, 두 군 모두 2등급 이상 GI 독성이 없고, 요실금 및 발기 기능 점수에서도 차이가 없어 ADT 없이 MRgART 기반 2회 분할 SBRT와 DIL 부스팅이 급성 독성 측면에서 허용 가능함을 확인하였고, 향후 연구 진행이 타당함을 시사한다.

Added: 2026-04-05 16:08View ↗

6Assessing Quality and Adherence to PI-RADSv2.1 Minimum Technical Standards of Prostate MRI in NRG-GU005.

2026-02Journal of magnetic resonance imaging : JMRI⭐ Q1DOI 10.1002/jmri.70142

BACKGROUND

Multi-parametric MRI (mpMRI) datasets often vary between sites due to differences in acquisition protocols.

OBJECTIVE

Evaluate adherence of multi-site mpMRI dataset to minimum technical standards (MTS) of PI-RADSv2.1. STUDY TYPE: Prospective. SUBJECTS: Six hundred patients (Age (years): ≤ 49 = 0.8%, 50-59 = 10.7%, 60-69 = 47.0%, ≥ 70 = 41.5%) with intermediate-risk prostate cancer (PCa) imaged across 124 institutions prior to radiotherapy. FIELD STRENGTH/SEQUENCE: 3T, 1.5T, and 1.16T, T2-weighted (T2w): fast spin-echo, diffusion-weighted imaging (DWI): single-shot echo-planar imaging, and dynamic contrast-enhanced (DCE): T1-weighted 3D fast spoiled gradient echo. ASSESSMENT: Scanner vendors included Siemens, GE, Philips, Toshiba, and Hitachi. Degree of adherence to PIRADSv2.1 was determined as the proportion of datasets that met MTS. Mean and standard deviation of parameter values were calculated where applicable. Prostate imaging quality (PI-QUAL)v2 scores were assigned by one of three observers in 491 datasets. Evaluation of DICOM metadata consistency was performed. STATISTICAL TESTS: Fisher's exact test to assess changes in MTS adherence over time and by field strength; Harrel's C-index to compare MTS adherence to PI-QUAL score. A p value of < 0.001 is considered statistically significant after Bonferroni correction.

RESULTS

Eighty-two percent of MTS showed greater than 75% adherence. Low adherence was found in the in-plane dimension (frequency-encoding direction) for T2w images (57%, mean = 0.45 ± 0.16 mm) and field of view (FOV) for DW images (62%, mean = 22.67 ± 4.70 cm). Only 50% of datasets used the recommended high b value image to compute the apparent diffusion coefficient map. Adherence improved significantly over time for one T2w and two DWI parameters; the adherence of FOV improved significantly at 3T for T2w and DWI sequences. C-index values for two T2w and two DWI parameters demonstrated a relationship between PI-RADS MTS and PI-QUAL score. Ten percent of anonymized datasets were stripped of some sequence information. DATA

CONCLUSION

Results show promise for mpMRI standardization in characterization of PCa and identify key parameters that remain variable across datasets and institutions. EVIDENCE LEVEL: 1. TECHNICAL EFFICACY: Stage 2. TRIAL REGISTRATION: ClinicalTrials.gov: NCT03367702. Variability in the way MRI scans are performed at different institutions and with different types of MRI scanners can make it difficult to obtain consistent results. We examined the MRI scan parameters of a large, multi‐institutional dataset to determine how well they follow the guidelines outlined in the Prostate Imaging‐Reporting Data System (PI‐RADS)v2.1. We found most of the parameters showed high adherence to PI‐RADSv2.1. Further examination of those parameters with lower adherence may provide insight that could be beneficial to future efforts to standardize the way in which MRI scans are performed.

🇰🇷 핵심 요약

본 연구는 다기관에서 수집된 전립선 다중파라미터 MRI 데이터가 PI‑RADS v2.1 최소 기술 기준(MTS)에 얼마나 부합하는지를 평가하기 위해, 124개 기관의 600명 중간 위험 전립선암 환자를 대상으로 전향적 분석을 수행하였다. 전체 파라미터 중 82%가 75% 이상 준수를 보였으나, T2‑weighted 영상의 평면 차원(57%) 및 확산강조영상(FOV, 62%) 등 일부 핵심 파라미터에서는 낮은 준수율을 보였으며, 고b값을 이용한 ADC 맵 생성는 50%에 불과하였다. 이러한 결과는 전반적인 mpMRI 표준화가 진행 중임을 시사하지만, 여전히 변동성이 큰 파라미터에 대한 개선 필요성을 강조한다.

Added: 2026-04-05 16:08View ↗

7Diagnostic Accuracy of Fully Hybrid PET/MRI with [68Ga]Ga-PSMA-11 and [68Ga]Ga-RM2 in Detecting Primary Prostate Cancer: A Phase 2 Trial with Histology as Gold Standard.

2026-02Journal of nuclear medicine : official publication, Society of Nuclear Medicine⭐ Q1DOI 10.2967/jnumed.125.269782

The primary aim of this study was to compare the diagnostic accuracy of [68Ga]Ga-PSMA-11 PET, [68Ga]Ga-RM2 PET, and multiparametric MRI (mpMRI) for the detection of primary prostate cancer (PCa) using histopathology as the reference. The secondary aims of the study were to assess the agreement among imaging modalities and identify noninvasive biomarkers for the diagnosis and risk stratification of patients.

METHODS

Forty-two patients with biopsy-confirmed, high-risk PCa were enrolled in this single-center, prospective, phase 2 clinical trial between September 2020 and May 2023 at San Raffaele hospital. All patients underwent [68Ga]Ga-PSMA-11 PET/MRI with mpMRI, and 36 had additional imaging with [68Ga]Ga-RM2 PET/MRI. All patients were included in the patient-level T staging analysis. Twenty-five patients were treated with radical prostatectomy with extended lymphadenectomy and considered for N staging analysis. Sixteen patients underwent all imaging and surgical procedures needed for coregistration between imaging and histology and were included in the lesion-based analysis for T staging. Two expert nuclear medicine physicians reviewed [68Ga]Ga-PSMA-11 and [68Ga]Ga-RM2 PET images with knowledge of the patients' available clinical and imaging information. mpMRI was interpreted as the standard of care by 2 expert radiologists using Prostate Imaging Reporting and Data System, version 2, criteria. Peripheral whole-blood samples were collected at the time of patient's enrollment to assess their association with lymph node involvement on histology.

RESULTS

In the patient-based analysis, [68Ga]Ga-PSMA-11 PET and mpMRI identified at least 1 intraprostatic lesion in all patients, whereas [68Ga]Ga-RM2 PET results were negative in 3 of 36 patients. The lesion-level analysis performed in 16 patients showed that, in this cohort, the dominant intraprostatic lesion was always detected by [68Ga]Ga-RM2 PET, whereas both [68Ga]Ga-PSMA-11 PET and mpMRI missed it, reporting a false-positive finding elsewhere. For N staging analysis, [68Ga]Ga-PSMA-11 PET had the highest sensitivity among the investigated imaging modalities (sensitivity, 0.375). Blood analysis showed that a higher fraction of polymorphonuclear-myeloid-derived suppressor cells (MDSCs) over monocytic MDSCs was significantly associated patients with lymph node involvement on histology (P = 0.0285).

CONCLUSION

All imaging modalities showed high sensitivity for the preoperative detection of primary PCa, but only [68Ga]Ga-RM2 PET correctly identified the dominant lesion in all patients who underwent lesion-based subanalysis. The identification of lymph node involvement remains challenging, with [68Ga]Ga-PSMA-11 PET reaching a sensitivity of only 0.375. In this regard, the polymorphonuclear MDSC-to-monocytic MDSC ratio may represent a valuable biologic marker of lymph node involvement in patients with high-risk PCa and warrants further investigation.

🇰🇷 핵심 요약

본 연구는 조직병리학을 기준으로 [68Ga]Ga-PSMA-11 PET, [68Ga]Ga-RM2 PET, 그리고 다중 파라미터 MRI(mpMRI)의 원발 전립선암 진단 정확도를 비교하기 위해 2020년 9월부터 2023년 5월까지 고위험 전립선암 환자 42명을 대상으로 전향적 2상 임상시험을 수행하였으며, 모든 환자에게 [68Ga]Ga-PSMA-11 PET/MRI와 mpMRI를, 36명에게는 추가로 [68Ga]Ga-RM2 PET/MRI를 시행하고, 25명은 전절제술 후 조직검사를 통한 N 병기 분석을, 16명은 병변 수준 분석을 포함한 전반적인 영상‑병리 상관 분석을 진행하였다. 결과는 환자 수준에서 [68Ga]Ga-PSMA-11 PET와 mpMRI가 모든 환자에서 최소 한 병변을 검출한 반면, [68Ga]Ga-RM2 PET는 36명 중 3명에서 음성으로 나타났으며, 병변 수준에서는 [68Ga]Ga-RM2 PET가 우세 병변을 모두 검출한 반면 [68Ga]Ga-PSMA-11 PET와 mpMRI는 다른 부위에서 위양성을 보였고, N 병기 평가에서 [68Ga]Ga-PSMA-11 PET가 가장 높은 민감도(0.375)를 보였으며, 혈액 분석에서 다형핵성 골수유래 억제세포 비율이 림프절 전이와 유의하게 연관됨을 확인하였다. 따라서 세 영상법 모두 원발 전립선암의 높은 민감도를 보였으나, 병변 검출 정확도와 림프절 전이 예측에 차이가 있어 각각의 임상적 활용 가치가 다름을 시사한다.

Added: 2026-04-05 16:08View ↗

8PSMA-Directed PET/MRI Enables Noninvasive Diagnosis and Prognosis in Patients with Increased PSA Levels: Results from the Prospective Randomized RAPID Trial.

2026-02Journal of nuclear medicine : official publication, Society of Nuclear Medicine⭐ Q1DOI 10.2967/jnumed.125.270404

Systematic transrectal ultrasound-guided biopsy lacks accuracy in the primary diagnosis of prostate cancer (PCa) and causes side effects. We investigated prostate-specific membrane antigen (PSMA)-targeted PET/MRI as a less-invasive alternative for biopsy guidance and risk assessment.

METHODS

The RAPID study was a randomized, controlled, single-center, open-label phase 3 trial comparing the diagnostic efficacy of 68Ga-PSMA-11 PET/MRI with systematic transrectal ultrasound-guided prostate biopsy. In total, 220 men with suspected PCa were randomized to either a standard (random 12-core biopsy; RB) group or an image-guided biopsy (IGB) group. Biopsy, prostatectomy histology, and follow-up visits served as references.

RESULTS

PET/MRI prospectively predicted 91 of 113 histologically verified tumors, corresponding to a sensitivity of 80.5% and a positive predictive value of 84.3%. Among tumors characterized as ISUP GG of 3 or greater (n = 60), PSMA PET/MRI prospectively detected 95% (n = 57). The IGB group demonstrated slightly higher sensitivity, specificity, positive predictive value, and negative predictive value compared with the RB group (79.3%, 94.7%, 85.2%, 92.2% vs. 74.2%, 88.0%, 71.9%, 89.2%). Seventy-nine patients were eligible for a direct IGB and RB subanalysis, with IGB detecting 15 additional cases. PET/MRI showed high specificity (94%) and negative prediction (86%) for tumor aggressiveness. In a median follow-up period of 3 y, an aggressive course of disease was detected in 25 of 199 patients. RB correlation identified 24 patients with an ISUP GG of 3 or greater with aggressive disease development during follow-up, compared with 23 patients identified by PET/MRI. Negative prediction of both methods was comparably high at 99%; however, PET/MRI overestimated fewer patients (21) as aggressive compared with RB (34).

CONCLUSION

PSMA-targeted PET/MRI-guided biopsy is a reliable, less invasive method for detecting and characterizing PCa in a cohort with moderately increased PSA values, potentially reducing unnecessary biopsies and provides a reliable prognosis of the course of disease. These results support the integration of modern imaging techniques into clinical practice to improve the treatment of PCa.

🇰🇷 핵심 요약

본 연구는 PSA 수치가 상승한 남성에서 전통적인 전립선 초음파 유도 12코어 조직검사와 비교하여 68Ga‑PSMA‑11 PET/MRI 기반 이미지 유도 조직검사의 진단 정확도와 예후 예측 능력을 평가하기 위해, 220명을 무작위 배정한 전향적 3상 임상시험을 수행하였다. 연구 결과, PET/MRI는 조직학적으로 확인된 종양 113건 중 91건을 검출하여 민감도 80.5%와 양성예측값 84.3%를 보였으며, ISUP GG≥3 고등급 종양을 95% 검출하고 종양 공격성에 대한 특이도 94%와 음성예측값 86%를 나타내어 기존 12코어 조직검사보다 과대평가가 적고 예후 예측이 우수함을 확인했으며, 이러한 결과는 PSMA‑표적 PET/MRI가 덜 침습적이며 불필요한 조직검사를 감소시키고 전립선암 치료에 현대 영상기법을 통합할 근거가 됨을 시사한다.

Added: 2026-04-05 16:08View ↗

968Ga-PSMA-11 in Staging of Unfavorable Intermediate- and High-Risk Prostate Cancer Reduces Indication for Noncurative Prostatectomy: A Prospective, Multicenter, IAEA Study.

2026-01Journal of nuclear medicine : official publication, Society of Nuclear Medicine⭐ Q1DOI 10.2967/jnumed.125.270537

Accurate staging of unfavorable intermediate- or high-risk prostate cancer (PCa) is essential for treatment decisions. Conventional imaging often fails to detect lymph node, bone, and visceral metastases, and for this purpose 68Ga-prostate-specific membrane antigen (PSMA)-11 PET/CT is clinically used. This prospective, multicenter, International Atomic Energy Agency-supported trial evaluated the accuracy of 68Ga-PSMA-11 PET/CT for initial staging compared with MRI and histopathology and the impact of 68Ga-PSMA-11 PET/CT on determining surgical eligibility.

METHODS

In a prospective, international study supported by the International Atomic Energy Agency, 775 patients with high-risk or unfavorable intermediate-risk PCa from 12 centers across 11 countries-including low-, middle-, and high-income settings, scheduled for radical prostatectomy based on conventional imaging (including bone scanning and pelvic MRI) underwent 68Ga-PSMA-11 PET/CT before treatment. PET and MRI findings were compared with radical prostatectomy histopathology, and the impact of PET on radical prostatectomy was assessed.

RESULTS

68Ga-PSMA-11 PET/CT detected metastatic disease (M1) in 20.4% of cases, altering management and preventing prostatectomy in 24.0%. The accuracy for seminal vesicle invasion was 90.1% for 68Ga-PSMA-11 PET/CT versus 57.3% for MRI, and for lymph node metastases it was 91.1% for 68Ga-PSMA-11 PET/CT versus 69.7% for MRI. In 13.1% of patients (78/593), there were discordant results between 68Ga-PSMA-11 PET/CT and histopathology. 68Ga-PSMA-11 PET/CT had false-negative lymph node findings in 8.6% of cases, with the most clinically significant being 4.5% of patients incorrectly staged as N0. False-positive lymph node findings at 68Ga-PSMA-11 PET/CT occurred in 4.5% of patients.

CONCLUSION

68Ga-PSMA-11 PET/CT significantly improves staging accuracy, reducing the indication for prostatectomy and impacting treatment decisions. These findings, from a broad international cohort including low-, middle-, and high-income countries, support the global adoption of 68Ga-PSMA-11 PET/CT into standard staging protocols for high-risk PCa.

🇰🇷 핵심 요약

본 연구는 고위험 또는 불리한 중간위험 전립선암 환자에서 68Ga‑PSMA‑11 PET/CT의 초기 병기 정확도를 기존 MRI와 비교하고, 이 영상법이 근치적 전립선 절제술 적응 판정에 미치는 영향을 평가하기 위해 12개 센터(11개 국가)에서 전향적으로 775명의 환자를 대상으로 PET/CT와 MRI를 시행한 후 수술 조직병리와 비교하였다. 연구 결과, 68Ga‑PSMA‑11 PET/CT는 전이성 병변을 20.4%에서 검출하여 수술 적응을 24.0% 감소시켰으며, 정낭 침범 진단 정확도가 MRI(57.3%)에 비해 90.1%로 우수하고 림프절 전이 진단 정확도도 91.1% 대 69.7%로 향상되었으나 13.1%에서 조직병리와 불일치가 있었고, 특히 위음성(8.6%) 및 위양성(4.5%) 림프절 결과가 관찰되었다. 이러한 결과는 68Ga‑PSMA‑11 PET/CT가 병기 정확도를 크게 향상시켜 비치료적 전립선 절제술의 적응을 감소시키며, 전 세계적으로 고위험 전립선암 표준 병기 프로토콜에 도입될 것을 권고한다.

Added: 2026-04-05 16:08View ↗

10Comparing Regional Saturation Biopsy and Targeted Biopsy: Is Perilesional Biopsy Necessary for High-Prostate-Specific Antigen Patients?

2026-01Annals of surgical oncology⭐ Q1DOI 10.1245/s10434-025-17980-9

BACKGROUND

Previous studies have indicated that regional saturation prostate biopsy (RSB) is more effective than targeted biopsy (TB) or systematic biopsy (SB) for patients with prostate-specific antigen (PSA) levels between 4 and 20 ng/mL. However, its efficacy in patients with PSA levels ≥ 20 ng/mL remains unclear. PATIENTS AND

METHODS

In this prospective, single-center, randomized controlled trial, we enrolled patients with PSA levels greater than 20 ng/mL and suspicious magnetic resonance imaging (MRI) findings from January 2021 to August 2023. The participants were randomized to undergo RSB or TB, and SB was also performed. The primary endpoint was the detection rate of clinically significant prostate cancer (csPCa), defined as an International Society of Urological Pathology (ISUP) grade ≥ 2.

RESULTS

RSB detected csPCa more frequently than did TB (90.2% versus 82.9%, p = 0.01) and SB (90.2% versus 82.5%, p = 0.01). Supplementary SB did not increase csPCa detection in the RSB group but did increase it in the TB group. Subgroup analysis revealed that RSB was particularly effective for patients with PSA levels between 20 and 50 ng/mL, prostate imaging-reporting and data system (PI-RADS) score of 3 lesions, prostate volume (PV) > 45 mL, and PSA density (PSAD) < 1.0 ng/mL/cc. However, the single-center design limits the generalizability of our findings.

CONCLUSION

Our trial suggests that RSB is superior to TB in detecting significant prostate cancers among patients with high PSA levels (≥ 20 ng/mL). Notably, perilesional biopsy is crucial for those with PSA levels between 20 and 50 ng/mL, larger PV, low PSAD, and low PI-RADS scores, enhancing csPCa detection.

🇰🇷 핵심 요약

본 연구는 PSA 수치가 20 ng/mL 이상이며 의심스러운 MRI 소견을 보이는 환자를 대상으로 전향적 단일센터 무작위 대조시험으로, 지역 포화 생검(RSB)과 표적 생검(TB)을 비교하고 전립선암의 임상적으로 의미 있는 암(csPCa, ISUP ≥ 2) 검출율을 주요 평가 지표로 설정하였다. 연구 결과, RSB는 csPCa 검출율이 TB(90.2% 대 82.9%, p=0.01)와 전통적 체계 생검(90.2% 대 82.5%, p=0.01)보다 유의하게 높았으며, 특히 PSA 20–50 ng/mL, PI‑RADS 3, 전립선 부피 > 45 mL, PSA 밀도 < 1.0 ng/mL/cc인 환자군에서 그 우월성이 두드러져 이러한 고위험군에서는 주변 병변 생검이 csPCa 검출에 필수적임을 시사한다.

Added: 2026-04-05 16:08View ↗

🔄 Run history

Run at	Source	Hits	New	Status
2026-04-26 00:00	LitReview			completed
2026-04-19 00:00	LitReview			completed
2026-04-14 14:30	LitReview			completed
2026-04-14 14:29	LitReview			completed
2026-04-14 14:28	LitReview			error
2026-04-14 11:57	LitReview			error
2026-04-12 00:00	LitReview			completed
2026-04-05 16:08	litreview:seed	101	101	seed-completed

↑ Back to top

📊 Overview

🔬 Radiology LLM Personalized Reporting

completed

Papers

New

100.0%

Abstracts

Search source

LitReview

Fetcher

📄 Latest report 📚 Full archive

🔍 Search query

(radiology[tiab] OR imaging[tiab]) AND ("large language model*"[tiab] OR LLM[tiab]) AND (personalized[tiab] OR personalization[tiab] OR style[tiab] OR adaptation[tiab]) AND (reporting[tiab] OR impression[tiab] OR conclusion[tiab])

📑 Recent papers

Showing latest 10 papers

1Exploring Full-cycle DeepSeek-assisted Case-based Learning in Undergraduate Radiology Education: A Respiratory System Example.

2026-04Academic radiology⭐ Q1DOI 10.1016/j.acra.2026.03.026

RATIONALE AND

OBJECTIVE

To evaluate the application of DeepSeek-assisted case-based learning (CBL) in respiratory radiology course across the full instructional cycle, including preparation, implementation, and evaluation.

METHODS

This prospective, single-center study was conducted in 2025 and involved third-year medical undergraduates. CBL Preparation: Six cases were retrieved from the Hospital Information System (HIS), and six generated via DeepSeek-R1. Preparation times were recorded and compared. CBL Implementation: Students were assigned to either a DeepSeek-assisted group or a traditional CBL group, with discussion time recorded for each subgroup. CBL Evaluation: Teaching effectiveness was evaluated through test scores and questionnaires. Subsequently, DeepSeek-R1 provided personalized feedback to students based on their individual scores.

RESULTS

A total of 200 students (mean age 21.02 ± 0.89 years, 94 males) participated. DeepSeek-generated cases required significantly less time than HIS-retrieved cases (p = 0.016). During implementation, the DeepSeek group spent less discussion time than traditional group (p = 0.026). The DeepSeek-assisted group achieved greater test score improvements compared to the traditional group (p < 0.05). Questionnaire responses indicated higher self-directed learning, greater interest in radiology, improved learning efficiency, and lower perceived learning burden in the DeepSeek-assisted group (p < 0.05). Additionally, personalized feedback generated by DeepSeek was qualitatively reviewed by the radiology teaching department and considered educationally useful.

CONCLUSION

This study demonstrates that DeepSeek-assisted CBL effectively supports respiratory radiology education throughout the entire course process-preparation, implementation, and evaluation-by enhancing efficiency, boosting student interest and engagement, improving performance, and providing valuable post-class feedback.

🇰🇷 핵심 요약

본 연구는 의과대학생의 호흡기 영상의학 교육을 위해 DeepSeek를 활용한 사례 기반 학습(CBL)의 전 과정을 평가하였습니다. 연구 결과, DeepSeek를 활용한 방식은 기존 방식보다 사례 준비 시간을 단축시켰을 뿐만 아니라, 학생들의 학습 효율성을 높이고 시험 성적 향상 및 학습 흥미 유발에 효과적임을 확인하였습니다. 또한, DeepSeek가 제공하는 개인별 맞춤형 피드백이 교육적으로 유용하다는 점을 입증하여 차세대 의학 교육 도구로서의 가능성을 보여주었습니다.

Added: 2026-04-12 00:00View ↗

2Augmenting Large Language Model With Prompt Engineering and Supervised Fine-Tuning in Non-Small Cell Lung Cancer Tumor-Node-Metastasis Staging: Framework Development and Validation.

2026-04JMIR AI⭐ Q1DOI 10.2196/77988

BACKGROUND

Accurate tumor node metastasis (TNM) staging is fundamental for treatment planning and prognosis in non-small cell lung cancer (NSCLC). However, its complexity poses significant challenges. Traditional rule-based natural language processing methods are constrained by their reliance on manually crafted rules and are susceptible to inconsistencies in clinical reporting.

OBJECTIVE

This study aimed to develop and validate a robust, accurate, and operationally efficient artificial intelligence framework for the TNM staging of NSCLC by strategically enhancing a large language model, GLM-4-Air (general language model), through advanced prompt engineering and supervised fine-tuning (SFT).

METHODS

We constructed a curated dataset of 492 deidentified real-world medical imaging reports, with TNM staging annotations rigorously validated by senior physicians according to the AJCC (American Joint Committee on Cancer) 8th edition guidelines. The GLM-4-Air model was systematically optimized via a multi-phase process: iterative prompt engineering incorporating chain-of-thought reasoning and domain knowledge injection for all staging tasks, followed by parameter-efficient SFT using low-rank adaptation for the reasoning-intensive primary tumor characteristics (T) and regional lymph node involvement (N) staging tasks. The final hybrid model was evaluated on a completely held-out test set (black-box) and benchmarked against GPT-4o using standard metrics, statistical tests, and a clinical impact analysis of staging errors.

RESULTS

The optimized hybrid GLM-4-Air model demonstrated reliable performance. It achieved higher staging accuracies on the black-box test set: 92% (95% CI 0.850-0.959) for T, 86% (95% CI 0.779-0.915) for N, 92% (95% CI 0.850-0.959) for distant metastasis status (M), and 90% for overall clinical staging; by comparison, GPT-4o attained 87% (95% CI 0.790-0.922), 70% (95% CI 0.604-0.781), 78% (95% CI 0.689-0.850), and 80%, respectively. The model's robustness was further evidenced by its macro-average F1-scores of 0.914 (T), 0.815 (N), and 0.831 (M), consistently surpassing those of GPT-4o (0.836, 0.620, and 0.698). Analysis of confusion matrices confirmed the model's proficiency in identifying critical staging features while effectively minimizing false negatives. Crucially, the clinical impact assessment showed a substantial reduction in severe category I errors, which are defined as misclassifications that could significantly influence subsequent clinical decisions. Our model committed 0 category I errors in M staging and fewer category I errors in T and N staging. Furthermore, the framework demonstrated practical deployability, achieving efficient inference on consumer-grade hardware (eg, 4 RTX 4090 GPUs) with latencies suitable and acceptable for clinical workflows.

CONCLUSION

The proposed hybrid framework, integrating structured prompt engineering and applying SFT to reasoning-heavy tasks (T/N), enables the GLM-4-Air model to serve as a highly accurate, clinically reliable, and cost-efficient solution for automated NSCLC TNM staging. This work demonstrates the efficacy and potential of a domain-optimized smaller model compared with an off-the-shelf generalist model, holding promise for enhancing diagnostic standardization in resource-aware health care environments.

🇰🇷 핵심 요약

본 연구는 GLM-4-Air 모델에 프롬프트 엔지니어링과 지도 미세 조정(SFT)을 결합하여 비소세포폐암(NSCLC)의 TNM 병기 결정을 자동화하는 프레임워크를 개발하고 검증하였습니다. 실제 임상 보고서 데이터셋을 활용한 평가 결과, 해당 모델은 GPT-4o 대비 우수한 병기 판정 정확도와 F1-score를 기록하였으며, 특히 임상적 의사결정에 중대한 영향을 미치는 오류를 유의미하게 감소시켰습니다. 본 프레임워크는 높은 정확도와 효율적인 추론 성능을 바탕으로 임상 현장에서 표준화된 병기 결정을 지원하는 실용적인 도구로서의 가능성을 입증하였습니다.

Added: 2026-04-04 06:48View ↗

3Automating the segmentation, date extraction, and classification of multi-report PDFs in outside medical records using optical character recognition and generative artificial intelligence.

2026-04JAMIA open⭐ Q1DOI 10.1093/jamiaopen/ooag027

OBJECTIVE

Patients referred for specialized care often arrive with outside medical records (OMRs) compiled into multi-report PDFs that include imaging, pathology, and clinical notes in unstructured formats. Reviewing these records is time consuming and mentally taxing, increasing the risk of delayed care, clinician frustration, and missed information affecting quality of care. This study aimed to automate the segmentation, classification, and date extraction of scanned OMRs, with a focus on records relevant to breast cancer care.

METHODS

We used optical character recognition (OCR) to extract machine-readable text from 1303 scanned PDF documents from 116 distinct external institutions. Gemini 1.5, a large language model (LLM), was then used to segment multi-report files into individual documents, classify them into clinically meaningful categories such as mammograms and pathology reports, and extract study dates to build diagnostic timelines. Document categories were informed by clinical workflows in a breast cancer center.

RESULTS

The system achieved an F1 score of 0.95 for segmentation, 0.96 for classification, and 0.90 for date extraction. In a pilot of 45 records reviewed by clinicians, only 2 classification errors and 1 date error were reported. Clinicians estimated that the tool reduced OMR review time by 40%, improved workflow efficiency, and increased satisfaction.

CONCLUSION

Our findings demonstrate that combining OCR with LLMs can significantly enhance the processing of unstructured medical records, reducing manual burden and supporting timely clinical decision-making.

CONCLUSION

This study demonstrates the successful application of OCR and LLMs for organizing scanned OMRs within a specialty clinic. By automating a previously manual process, the approach supports scalable review of incoming outside records and has potential for adaptation to other clinical workflows. Future work will focus on evaluating the system across additional specialties and institutions.

🇰🇷 핵심 요약

본 연구는 광학 문자 인식(OCR)과 거대언어모델(LLM)을 결합하여 외부 의료 기록(OMR) PDF의 분할, 분류 및 날짜 추출을 자동화하는 시스템을 개발하고 그 유효성을 평가했습니다. 유방암 진료 환경에서 테스트한 결과, 해당 시스템은 높은 정확도(F1 점수 0.90~0.96)를 보였으며, 임상의의 기록 검토 시간을 40% 단축하고 업무 효율성을 유의미하게 개선했습니다. 이 접근법은 비정형 의료 기록의 처리를 자동화함으로써 임상 의사결정의 신속성을 높이고 의료진의 업무 부담을 경감하는 데 기여할 수 있습니다.

Added: 2026-04-04 03:13View ↗

4AI-assisted tumor board decision-making in pancreatic oncology.

2026-03-20BMC medical informatics and decision making⭐ Q1DOI 10.1186/s12911-026-03444-x

BACKGROUND

Pancreatic cancer requires nuanced, multidisciplinary treatment planning typically conducted within tumor boards. While Large Language Models (LLMs) have shown capabilities in medical reasoning, their ability to approximate complex, integrative decision-making in oncology remains underexplored.

METHODS

This study evaluated the performance of LLaMA 3.3 (70b) in predicting tumor board decisions for newly diagnosed pancreatic cancer patients. Clinical documentation (including free-text imaging reports, pathology findings, and patient history) from 42 first-diagnosis cases discussed in a real-world tumor board was collected. The model was tasked with predicting one of three treatment options: surgical resection (SURG), neoadjuvant chemotherapy (NEO), or palliative therapy (PALL). Four prompting strategies were evaluated: zero-shot, advanced (adv.) zero-shot, Chain-of-Thought (CoT), and few-shot prompting. Performance was assessed using accuracy, micro- and macro-averaged F1 scores, and category-specific recall.

RESULTS

The advanced zero-shot and CoT strategies achieved the highest overall accuracy of 78.6% and a micro-averaged F1 score of 0.786. However, this performance was driven primarily by the correct classification of majority classes (SURG and PALL). Crucially, both high-accuracy strategies failed to identify any of the neoadjuvant therapy candidates (Recall NEO = 0.00; 0/7 cases), systematically misclassifying them as palliative or surgical. While few-shot prompting improved the detection of neoadjuvant cases (Recall NEO = 1.00), it introduced substantial noise, reducing overall accuracy to 56.7%. LLaMA 3.3 (70b) demonstrates high concordance with tumor board decisions for clear-cut surgical or palliative cases but exhibits a critical systematic failure in identifying candidates for neoadjuvant therapy. The high global accuracy masks a significant safety limitation regarding the recognition of complex, intermediate-stage patients.

CONCLUSION

These findings suggest that current LLMs may approximate majority-class decisions but risk overlooking curative treatment pathways in nuanced scenarios, necessitating rigorous oversight and specific adaptation before clinical consideration.

🇰🇷 핵심 요약

본 연구는 췌장암 환자의 다학제 진료(Tumor Board) 결정 예측에 있어 LLaMA 3.3 모델의 성능을 평가하였으며, 수술 및 완화 치료 결정에서는 높은 정확도를 보였으나 선행 항암화학요법 대상자를 식별하는 데에는 체계적인 한계를 드러냈습니다. 전반적인 정확도가 높더라도 복잡한 중간 단계 환자의 치료 경로를 놓칠 위험이 크므로, 임상 적용을 위해서는 모델의 정교한 보완과 엄격한 전문의 감독이 필수적입니다.

Added: 2026-04-04 06:48View ↗

5Automated detection of primary soft tissue sarcomas of the extremities using artificial intelligence and ChatGPT.

2026-03-16Frontiers in oncology🔷 Q2DOI 10.3389/fonc.2026.1674509

OBJECTIVE

Developing effective Convolutional Neural Networks (CNN) for soft tissue sarcoma detection often requires numerous iterations and adjustments, demanding specialized IT (Information Technology) skills. This study aims to use ChatGPT 4 to simplify CNN adaptation, reducing the need for specialized IT skills while enabling efficient exploration of training configurations to enhance diagnostic accuracy.

METHODS

This study leveraged a preexisting Artificial Intelligence (AI) model adapted using a preexisting Convolutional Neural Network (CNN). The study involved 54 participants diagnosed with primary soft tissue sarcomas in the extremities and possessing complete Magnetic Resonance Imaging (MRI) datasets. AI adaptations and programming were conducted using TensorFlow and verified with ChatGPT. Model training involved a dataset split of 70% training, 15% validation and 15% test set on patient level split, processed over eight epochs.

RESULTS

The adapted CNN model demonstrated significant improvement across various MRI sequences, achieving high accuracy levels (up to 98.5%) and excellent sensitivity and specificity rates. The model performed robustly in differentiating tumor presence in MR images, with test accuracies as high as 93.9%. The inclusion of a Gradient-weighted Class Activation Mapping (Grad-CAM) heat map and probability scores in the diagnostic outputs further enhanced interpretative capabilities.

CONCLUSION

This study highlights the potential of AI, particularly CNNs, in the early and accurate detection of soft tissue sarcomas, underscoring the technology's adaptability across different imaging modalities. The integration of large language models like ChatGPT into the model adaptation process emphasizes the reduced need for specialized IT skills, making advanced diagnostic tools more accessible and potentially improving diagnostic accuracy and patient outcomes in radiology and oncology.

🇰🇷 핵심 요약

본 연구는 ChatGPT를 활용하여 전문적인 IT 기술 없이도 합성곱 신경망(CNN) 모델을 효율적으로 최적화하고, 사지 연부 조직 육종의 MRI 진단 정확도를 높이는 방법을 제시하였습니다. 연구 결과, 최적화된 CNN 모델은 최대 98.5%의 높은 정확도와 우수한 민감도 및 특이도를 보였으며, Grad-CAM을 통한 시각적 해석 가능성까지 확보하였습니다. 이는 대규모 언어 모델을 활용한 AI 모델 구축이 영상의학 및 종양학 분야에서 진단 효율성을 높이고 임상적 접근성을 개선할 수 있음을 시사합니다.

Added: 2026-04-04 06:48View ↗

6Extraction of distant recurrence sites for breast cancer patients from free-text clinical notes using large language models.

2026-03Journal of biomedical informatics⭐ Q1DOI 10.1016/j.jbi.2026.105032

OBJECTIVE

Accurate documentation of distant recurrence sites in breast cancer is essential for evaluating treatment effectiveness and outcomes research. However, such information is embedded in unstructured clinical notes, making manual abstraction labor-intensive. Large language models (LLMs) offer a scalable solution for extracting complex information from heterogeneous clinical narratives; however, generic LLMs often lack the specialized clinical reasoning needed for accurate interpretation of oncologic documentation. This study aims to develop an efficient LLM-based framework to automatically extract distant recurrence sites from free-text documentation. MATERIALS &

METHODS

We used clinical notes, pathology and radiology reports from recurrent breast cancer patients at Mayo Clinic (n = 766) for model development and evaluated generalizability on internal hold-out samples (n = 112) and an external Stanford Medicine cohort (n = 110). For cross-disease domain adaptation, we further validated on prostate cancer patients (n = 49). Our proposed framework employs BioLinkBERT, a pretrained language model (PLM) backbone, with weak supervision and an epoch-wise entropy optimization to address limited labeled data and class imbalance across recurrence sites. The fine-tuned model was compared against state-of-the-art models, including Llama2-7B, Llama-3-8B and MedAlpaca, using precision, recall, and F1-score.

RESULTS

The fine-tuned model outperformed generic and domain-specific LLM baselines, with notable gains in identifying multi-site distant recurrence. In-domain validation showed consistent F1-score improvement (average 0.78), particularly for rare recurrence sites. The model also demonstrated strong performance on the external Stanford cohort and on prostate cancer, achieving F1-score of 0.83 and 0.93, respectively.

CONCLUSION

This study presents an efficient, weakly supervised LLM framework that accurately extracts metastatic recurrence sites, reducing reliance on manual chart review. The results demonstrate that relatively small LLMs, optimized with domain-aware weak supervision, can outperform larger models for complex oncologic information extraction. The model is released as a platform-independent Docker image to support seamless cancer registry integration.

🇰🇷 핵심 요약

본 연구는 유방암 환자의 비정형 임상 기록에서 원격 전이 부위를 자동으로 추출하기 위해 BioLinkBERT 기반의 약지도 학습(weakly supervised) LLM 프레임워크를 개발하였습니다. 해당 모델은 내부 및 외부 검증 데이터셋에서 기존 LLM 대비 우수한 성능을 보였으며, 특히 다발성 및 희귀 전이 부위 식별에서 높은 정확도를 입증했습니다. 이는 수동 차트 리뷰의 부담을 줄이고 암 등록 체계의 효율성을 높이는 데 기여할 것으로 기대됩니다.

Added: 2026-04-04 03:13View ↗

7Automated Report Generation in Ophthalmology: Integrating Artificial Intelligence, Multimodal Imaging, and Clinical Data.

2026-02-19Ophthalmology and therapy⭐ Q1DOI 10.1007/s40123-026-01316-1

Artificial intelligence (AI) has emerged as a transformative force in ophthalmology, enabling automated, accurate, and efficient clinical reporting. This review summarizes recent advances in AI-driven report generation, emphasizing the integration of multimodal imaging and clinical data. Deep learning and natural language processing (NLP) models can synthesize information from diverse sources-including fundus photography, optical coherence tomography, fluorescein angiography, and patient records-to generate structured, interpretable, and personalized diagnostic reports. Such systems enhance diagnostic precision, streamline workflow, and reduce interobserver variability. We outline the technological foundations underlying these systems, including convolutional and transformer-based architectures, self-supervised and multimodal learning, and large language models. Representative applications in diabetic retinopathy, glaucoma, cataract, and age-related macular degeneration are discussed, highlighting their clinical value and emerging real-world deployment. Persistent challenges-including data heterogeneity, model interpretability, ethical governance, and clinical integration-are critically reviewed. Finally, we explore future directions such as real-time AI-assisted reporting, predictive and personalized analytics, and global scalability across healthcare ecosystems. Multimodal, explainable, and clinically integrated AI systems hold promise to redefine ophthalmic diagnostics and improve both clinician efficiency and patient outcomes.

🇰🇷 핵심 요약

본 연구는 안과 분야에서 딥러닝과 자연어 처리 기술을 활용해 다중 모달 영상 및 임상 데이터를 통합한 자동 진단 보고서 생성 시스템의 최신 동향을 고찰하였습니다. 이러한 AI 기반 시스템은 진단의 정확도를 높이고 관찰자 간 변이를 줄여 임상 워크플로우를 효율화할 수 있음을 확인했습니다. 다만, 임상 현장에 성공적으로 도입되기 위해서는 데이터 이질성 해결, 모델의 해석 가능성 확보 및 윤리적 거버넌스 구축이 선행되어야 합니다.

Added: 2026-04-04 06:48View ↗

8Differences and Trends of Artificial Intelligence in Medical Education: A Comparative Bibliometric Analysis Between China and the International Community.

2026-01-31Advances in medical education and practice🔷 Q2DOI 10.2147/amep.s573537

OBJECTIVE

This study aims to explore the application of artificial intelligence in medical education by comparing research hotspots and evolutionary trends between China and the international community, ultimately proposing informed educational practices and policy recommendations.

METHODS

Literature was retrieved from the core collections of CNKI and Web of Science for the period 2014-2024, limited to article and review publications. After applying a unified Boolean search strategy and deduplication, the data were analyzed using CiteSpace 6.4.R1 to examine publication trends, collaboration networks, keyword co-occurrence/clustering/burst detection, and co-citation patterns.

RESULTS

A total of 379 Chinese and 552 English records were included. Publications surged after 2018 and peaked during 2023-2024. International hotspots centered on machine learning, deep learning, and large language models for simulation-based training and clinical reasoning; Chinese studies focused on "New Medical Sciences", VR/AR, and medical imaging. The emergence of generative artificial intelligence and multimodal large models has become a new frontier in artificial intelligence research within global medical education from 2023 to 2024.

CONCLUSION

This study is based on a comparison of two databases to reveal the hotspots and differences in artificial intelligence and medical education research between China and the international research community. It not only compensates for the time lag of existing research, but also proposes three major trends driven by artificial intelligence in the development of medical education (generative AI, personalized learning, immersive experience). A complementary pattern exists between technology-driven and scenario-driven orientations. We recommend integrating AI literacy and ethics into curricula, establishing Generative-AI teaching/assessment guidelines, and building cross-institutional, yearly knowledge-map monitoring for sustainable innovation in medical education.

🇰🇷 핵심 요약

본 연구는 2014년부터 2024년까지의 문헌을 바탕으로 중국과 국제 사회의 의학 교육 내 인공지능(AI) 연구 동향을 비교 분석하였습니다. 분석 결과, 국제 연구는 머신러닝과 거대언어모델을 활용한 임상 추론 교육에 집중된 반면, 중국은 가상현실(VR/AR) 및 의료 영상 기술에 중점을 두는 차이를 보였습니다. 향후 의학 교육의 혁신을 위해 생성형 AI의 교육적 활용 가이드라인 수립과 AI 리터러시 및 윤리 교육의 통합이 필요함을 제언합니다.

Added: 2026-04-04 06:48View ↗

9Radiology Board-Style Examinations and LLMs: A Scoping Review of Model Performance.

2026-01-29Journal of the American College of Radiology : JACR⭐ Q1DOI 10.1016/j.jacr.2026.01.017

BACKGROUND

Large language models (LLMs) are increasingly being evaluated for their ability to answer official radiology board-style examination questions. Understanding their accuracy, limitations, and potential applications in education is essential for assessing their utility in the field.

METHODS

A scoping review was conducted in October 2025 across PubMed, Scopus, and Web of Science, following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Studies were included if they evaluated LLMs on official radiology board-style examination questions. After screening 205 unique records, 29 studies met the inclusion criteria. Data were extracted on study characteristics, including LLM type and version, input modality, language, examination type, answer format, comparison with humans, and reported outcomes.

RESULTS

The reviewed studies evaluated multiple LLMs, predominantly Chat Generative Pre-trained Transformer (GPT)-based models (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o), as well as Claude, Gemini, Llama 3, and Mixtral. Text-only evaluations generally yielded higher accuracy (≈65%-90%) compared with multimodal tasks (45%-89%). GPT-4 and its variants consistently outperformed earlier versions, occasionally exceeding average human performance. Open-source models such as Llama 3 70B and Mixtral achieved comparable results to proprietary models, offering advantages in local deployment and privacy. Few studies directly compared LLM performance with human radiologists.

CONCLUSION

LLMs demonstrate promising performance in answering text-based radiology board-style examination questions, particularly GPT-4-based models. Nevertheless, significant limitations persist in multimodal tasks and complex reasoning scenarios.

🇰🇷 핵심 요약

본 연구는 영상의학 전문의 시험 문항을 활용하여 대규모 언어 모델(LLM)의 성능을 분석한 스코핑 리뷰로, 29개 연구를 체계적으로 검토하였습니다. 분석 결과, GPT-4 계열 모델은 텍스트 기반 문제에서 높은 정확도를 보이며 인간의 평균 점수를 상회하기도 했으나, 멀티모달 과제 및 복합적 추론 상황에서는 여전히 유의미한 한계를 나타냈습니다. 향후 LLM의 교육적 활용을 위해서는 이러한 기술적 제약을 극복하고 임상적 판단 능력을 보완하는 추가적인 검증이 필요합니다.

Added: 2026-04-04 06:48View ↗

10Comparing the performance of radiomics, nomograms, machine learning, and large language models in predicting 28-day mortality in severe community-acquired pneumonia patients.

2026-01-19Frontiers in immunology⭐ Q1DOI 10.3389/fimmu.2025.1679496

BACKGROUND

Severe community-acquired pneumonia (SCAP) is a significant global health challenge due to its high mortality. Despite advances, early diagnosis and effective management remain critical. Tools like radiomics analyze imaging data for risk assessment, while machine learning and nomograms aid in personalized treatment. Large language models (LLMs) enhance clinical decision-making by analyzing data and supporting care strategies. This study integrates these methods to predict 28-day mortality in SCAP patients.

METHODS

A cohort of 599 patients diagnosed with severe community-acquired pneumonia (SCAP), including 316 males and 283 females, from Shanghai East Hospital and Xiamen Humanity Hospital were enrolled in this study. High-resolution lung CT scans were used to segment three-dimensional regions of interest, from which 1,050 radiomic features were extracted. The dataset was divided into a training set (80%) and an independent test set (20%), and k-fold cross-validation was applied to optimize model performance. To address class imbalance, the SMOTE oversampling technique was employed. The study integrated radiomics, nomograms, seven machine learning models, and five LLMs to predict the 28-day mortality risk in SCAP patients. SHAP values were utilized to enhance the interpretability of feature contributions. Not only that, this study integrates the prior knowledge provided by LLMs, processed through an embedding layer, with data-driven feature learning in the main network, and dynamically fuses their outputs using a bias network with a gating mechanism, thereby improving the accuracy and interpretability of LLMs in predicting 28-day mortality risk for SCAP patients.

RESULTS

Key predictors of 28-day mortality included inflammatory markers, cytokines, age, CRP, and oxygenation index. Clinical-Radiomics models achieved strong accuracy (AUC 0.92). Machine learning models, particularly XGBoost (AUC 0.90), were highly effective, with SHAP analysis emphasizing radscore's importance. LLMs like Chatgpt also performed well (AUC 0.78), showcasing the potential of integrating clinical, radiomic, and AI-driven approaches.

CONCLUSION

This study demonstrates the effectiveness of radiomics, machine learning, and LLMs to predict SCAP outcomes. Models like XGBoost achieved superior accuracy, while SHAP analysis improved interpretability. These advancements highlight the potential for enhanced SCAP prognosis and personalized care strategies.

🇰🇷 핵심 요약

본 연구는 중증 지역사회 획득 폐렴(SCAP) 환자 599명을 대상으로 방사선 특징, 기계 학습, 거대언어모델(LLM)을 통합하여 28일 사망률을 예측하는 모델의 성능을 평가하였습니다. 임상 정보와 방사선 특징을 결합한 모델이 0.92의 높은 AUC를 기록하였으며, XGBoost와 같은 기계 학습 모델이 우수한 예측력을 보였습니다. 특히 LLM의 사전 지식과 데이터 기반 학습을 융합한 방식은 예측 정확도와 해석 가능성을 동시에 향상시켜 향후 SCAP 환자의 맞춤형 치료 전략 수립에 기여할 것으로 기대됩니다.

Added: 2026-04-04 06:48View ↗

🔄 Run history

Run at	Source	Hits	New	Status
2026-04-19 00:00	LitReview	1		completed
2026-04-14 14:29	LitReview			completed
2026-04-14 14:28	LitReview			completed
2026-04-14 14:28	LitReview			error
2026-04-14 11:57	LitReview			error
2026-04-12 00:00	LitReview	1	1	completed
2026-04-05 00:00	LitReview	2		completed
2026-04-04 07:56	pubmed:seed	58		seed-completed

↑ Back to top

📊 Overview

🔬 Ureteral Stone Detection on KUB

completed

Papers

New

100.0%

Abstracts

All Sources

Search source

LitReview

Fetcher

📄 Latest report 📚 Full archive

🔍 Search query

("ureteral stone*"[tiab] OR "ureteral calculus"[tiab] OR "ureteral calculi"[tiab] OR urolithiasis[tiab]) AND (KUB[tiab] OR "kidney, ureter, and bladder"[tiab] OR "kidney ureter bladder"[tiab] OR "abdominal radiograph*"[tiab] OR "plain radiograph*"[tiab] OR radiograph*[tiab] OR x-ray[tiab]) AND ("artificial intelligence"[tiab] OR "deep learning"[tiab] OR "machine learning"[tiab] OR "neural network*"[tiab] OR "computer-aided diagnosis"[tiab] OR AI[tiab] OR CNN[tiab])

📑 Recent papers

Showing latest 10 papers

1The diagnostic performance of machine learning based detection of urinary tract stones: a systematic review and meta-analysis.

2026-04European journal of radiology⭐ Q1DOI 10.1016/j.ejrad.2026.112836

BACKGROUND

Urolithiasis is a prevalent urological condition, and Non-Contrast Computed Tomography (NCCT) is the gold standard for diagnosis. In recent years, there has been growing interest in investigating machine learning (ML)- based detection of urolithiasis and the wider potential of AI in urology.

OBJECTIVE

To synthesise the diagnostic accuracy of ML-based UTS detection on NCCT and in externally validated cohorts.

METHODS

We performed a systematic review and bivariate meta-analysis of studies evaluating ML for detecting urinary stones. We used QUADAS-2 to assess the risk of bias. Subgroup analyses examined performance by model type, classification task, stone site, dataset source, and CT orientation. Bivariate meta-regression was performed to further explore heterogeneity. Publication bias was assessed using Deeks' test. The study was prospectively registered in Prospero (CRD42024542409).

RESULTS

Forty-five studies were included qualitatively. 24 studies (49,277 test images) provided extractable 2 × 2 data for meta-analysis. For NCCT (10 studies), pooled sensitivity was 96% (95% CI 92-98%) and pooled specificity was 98% (95% CI 97-99%). In externally validated NCCT cohorts (4 studies; 1,056 images), pooled sensitivity was 95% (95% CI 92-97%) and pooled specificity was 96% (95% CI 70-100%). Subgroup performance remained high, but heterogeneity persisted; meta-regression found stone site contributed to variability (p = 0.014), while other moderators were not significant. Deeks' test showed no small-study effects (p = 0.571).

CONCLUSION

ML models show high image-level diagnostic performance for stone detection on NCCT and may support radiologists as decision support tools. Translation is limited by heterogeneity and limited external validation. Future studies should move beyond detection-alone tasks towards clinically meaningful outputs that are actionable for radiologists and downstream clinicians, including urologists and nephrologists.

🇰🇷 핵심 요약

본 체계적 문헌고찰 및 메타분석은 비조영 CT(NCCT)를 이용한 요로결석 검출에 있어 머신러닝(ML) 모델의 진단 성능을 평가하고자 수행되었습니다. 분석 결과, ML 모델은 NCCT 기반 결석 검출에서 높은 민감도(96%)와 특이도(98%)를 보였으며, 외부 검증 코호트에서도 우수한 성능을 유지하였습니다. 다만, 결석 위치에 따른 이질성이 존재하므로 향후 임상적 의사결정에 실질적으로 기여할 수 있는 연구가 필요합니다.

Added: 2026-04-12 00:00View ↗

2Classification of urinary stones using near-infrared spectroscopy and chemometrics: A promising method for intraoperative application.

2025-04-01Analytica chimica acta⭐ Q1DOI 10.1016/j.aca.2025.344007

In low-invasive surgical treatment of urolithiasis, there is a need for an analytical method to determine the chemical composition of urinary stones in real-time mode, i.e., intraoperatively. While a thorough phase analysis can be done after the surgery, preliminary information about a target stone would be helpful for the specialists for choosing an optimal strategy of treatment and giving some immediate dietary or drug prescriptions to a patient. Near-infrared spectroscopy (NIRS) is a good candidate for such a method that can provide immediate results without obligatory sample preparation. Fiber optic probes, often used for acquiring near-infrared spectra, are compatible with surgical instrumentation. Chemometric algorithms can successfully resolve the complexity of NIR spectra, which consist of overlapped signals. For the first time, we applied NIRS in diffuse reflectance mode to classify three major types of urinary stones: oxalates, urates, and phosphates. To imitate the real conditions of a surgery, the NIR spectra were acquired not only under ambient conditions but also in saline medium. A trained and optimized multinomial classifier (Error Correcting Output Codes) showed an acceptable precision and recall for an independent validation dataset. Even considering the strong absorbance of saline, the calculated geometric mean was 94 %, 87 %, and 71 % for oxalates, urates, and phosphates, respectively. A first real-time approbation during a real surgery (percutaneous nephrolithotomy) demonstrated a compatibility of the suggested approach with the surgical protocols and a good agreement of the acquired NIR spectra and the results of reference X-ray phase analysis.

🇰🇷 핵심 요약

본 연구는 수술 중 요로결석의 화학적 조성을 실시간으로 분석하기 위해 근적외선 분광법(NIRS)과 화학계량학적 알고리즘을 결합한 분류 모델을 개발하였습니다. 생리식염수 환경에서도 옥살산염, 요산염, 인산염 결석을 높은 정확도로 분류하였으며, 실제 경피적 신쇄석술 현장에서 기존 수술 프로토콜과의 호환성 및 임상적 유효성을 입증하였습니다. 이 방법은 수술 중 즉각적인 결석 성분 정보를 제공함으로써 최적의 치료 전략 수립 및 환자 맞춤형 처방에 기여할 수 있을 것으로 기대됩니다.

Added: 2026-04-05 07:09View ↗

3A Pilot Study on Using an Artificial Intelligence Algorithm to Identify Urolith Composition through Abdominal Radiographs in the Dog.

2025-03Veterinary radiology & ultrasound : the official journal of the American College of Veterinary Radiology and the International Veterinary Radiology Association⭐ Q1DOI 10.1111/vru.70012

In small animal practice, patients often present with urinary lithiasis, and prediction of urolith composition is essential to determine the appropriate treatment. Through abdominal radiographs, the composition of mineral radiopaque uroliths can be determined by considering many different factors; this can be complex and, as such, tailor-made for the use of artificial intelligence (AI). The Minnesota Urolith Center partnered with Hill's Pet Nutrition to develop a deep learning AI algorithm (CALCurad) within a smartphone application called the MN Urolith Application that allows for the preliminary assessment of urolith composition. The algorithm provides the probability of a urolith being composed of struvite from an image taken of an abdominal radiograph. This pilot study evaluates the accuracy of the CALCurad in the context of clinical practice. A sample population of 139 dogs was considered, and the results obtained by the CALCurad were compared with the results obtained by infrared spectroscopy analysis. Agreement between the application and quantitative analyses was 81.3%. These results suggest that the CALCurad can effectively be used to predict urolith composition in dogs, helping the clinician to decide between medical and surgical management of the patient. The use of the CALCurad is an example of the usefulness of AI in helping veterinarians make clinical decisions in patient care.

🇰🇷 핵심 요약

본 연구는 개 복부 방사선 사진을 통해 요석 성분을 예측하는 인공지능 알고리즘(CALCurad)의 임상적 정확도를 평가하고자 수행되었습니다. 139마리의 개를 대상으로 적외선 분광 분석 결과와 비교한 결과, 81.3%의 높은 일치도를 보였습니다. 따라서 해당 알고리즘은 임상 현장에서 요석 성분을 신속히 예측하여 내과적 치료와 외과적 수술 결정을 돕는 유용한 보조 도구로 활용될 수 있습니다.

Added: 2026-04-05 07:09View ↗

4Does Deep Learning Reconstruction Improve Ureteral Stone Detection and Subjective Image Quality in the CT Images of Patients with Metal Hardware?

2025-02-11Journal of endourology⭐ Q1DOI 10.1089/end.2024.0666

BACKGROUND

Diagnosing ureteral stones with low-dose CT in patients with metal hardware can be challenging because of image noise. The purpose of this study was to compare ureteral stone detection and image quality of low-dose and conventional CT scans with and without deep learning reconstruction (DLR) and metal artifact reduction (MAR) in the presence of metal hip prostheses.

METHODS

Ten urinary system combinations with 4 to 6 mm ureteral stones were implanted into a cadaver with bilateral hip prostheses. Each set was scanned under two different radiation doses (conventional dose [CD] = 115 mAs and ultra-low dose [ULD] = 6.0 mAs). Two scans were obtained for each dose as follows: one with and another without DLR and MAR. Two blinded radiologists ranked each image in terms of artifact, image noise, image sharpness, overall quality, and diagnostic confidence. Stone detection accuracy at each setting was calculated.

RESULTS

ULD with DLR and MAR improved subjective image quality in all five domains (p < 0.05) compared with ULD. In addition, the subjective image quality for ULD with DLR and MAR was greater than the subjective image quality for CD in all five domains (p < 0.05). Stone detection accuracy of ULD improved with the application of DLR and MAR (p < 0.05). Stone detection accuracy of ULD with DLR and MAR was similar to CD (p > 0.25).

CONCLUSION

DLR with MAR may allow the application of low-dose CT protocols in patients with hip prostheses. Application of DLR and MAR to ULD provided a stone detection accuracy comparable with CD, reduced radiation exposure by 94.8%, and improved subjective image quality.

🇰🇷 핵심 요약

본 연구는 금속 인공관절이 있는 환자의 저선량 CT 촬영 시 딥러닝 재구성(DLR) 및 금속 인공물 감소(MAR) 기술이 요관 결석 진단과 영상 품질에 미치는 영향을 평가하였습니다. 연구 결과, 초저선량 CT에 DLR과 MAR을 적용할 경우 기존 선량 대비 방사선 노출을 94.8% 줄이면서도 결석 진단 정확도는 대등하게 유지하고 주관적 영상 품질은 오히려 향상되는 것으로 나타났습니다. 따라서 해당 기술은 금속 하드웨어가 있는 환자에서 저선량 CT 프로토콜을 안전하게 적용할 수 있는 유용한 대안이 될 수 있습니다.

Added: 2026-04-05 07:09View ↗

5Is Kidney-Ureter-Bladder Radiography Still a Helpful Tool to Address Acute Ureteral Colic in Emergency Settings?

2025CureusDOI 10.7759/cureus.90365

Background This study aims to identify the reliability of kidney-ureter-bladder (KUB) radiography as a triage tool in acute ureteral colic (AUC). Moreover, this article correlates between KUB and non-contrast computerized tomography (NCCT) in view of stone characteristics and clinical outcomes. Methodology A retrospective cohort study recruited patients who had proven ureteric stones on NCCT. A blinded review of KUB and NCCT was performed to identify the following variables in both tests: site, ureteric stone maximum diameter, and stone density. Correlation between KUB radiography and NCCT has been performed. The intermethod reliability was used to measure the degree to which test scores are consistent when the methods or instruments employed vary. Results One hundred fifty-one patients were included, of whom 75 (50%) had negative KUB and positive NCCT results for ureteric stones based on the blinded review. Lower ureteral calculi were found to be the most common location in both KUB (n = 49, 65%) and NCCT images (n = 81, 54%). The median stone diameters of KUB and NCCT were 5 (3-8) mm and 6 (4-9) mm, respectively. Hounsfield unit densities of more than 630 were found in 86 (57%) patients, and radiopaque stones were found in 76 (50%) patients. There was moderate and significant concordance (Cohen's kappa = 0.520) between NCCT and KUB regarding stone location (P < 0.01). There was a strong concordance (Cohen's kappa = 0.804) between NCCT and KUB in detecting ureteric stone maximum diameter (P < 0.01). Stone density was weakly correlated between KUB and NCCT (Cohen's kappa = 0.254) (P = 0.001). Thirty-four cases (45%) of negative KUB results required surgical intervention (SI). Sepsis (n = 5, 15%) and acute kidney injury (n = 23, 68%) were the main indications for SI in negative KUB and positive NCCT ureteric stones. Conclusions KUB radiography should not be used as a triage tool in AUC due to potentially harmful outcomes. However, KUB radiography can be reliably used during follow-up, as there is a strong correlation between KUB radiography and NCCT for KUB-detectable ureteric stones.

🇰🇷 핵심 요약

본 연구는 급성 요관 결석 통증 환자에서 KUB(신신방광) 촬영

Added: 2026-04-05 07:09View ↗

6Harnessing Artificial Intelligence to Predict Spontaneous Stone Passage: Development and Testing of a Machine Learning-Based Calculator

2025Journal of Endourology⭐ Q1DOI 10.1089/end.2024.0755

Objective: We sought to use artificial intelligence (AI) to develop and test calculators to predict spontaneous stone passage (SSP) using radiographical and clinical data. Methods: Consecutive patients with solitary ureteral stones ≤10 mm on CT were prospectively enrolled and managed according to American Urological Association guidelines. The first 70% of patients were placed in the "training group" and used to develop the calculators. The latter 30% were enrolled in the "testing group" to externally validate the calculators. Exclusion criteria included contraindication to trial of SSP, ureteral stent, and anatomical anomaly. Demographic, clinical, and radiographical data were obtained and fed into machine learning (ML) platforms. SSP was defined as passage of stone without intervention. Calculators were derived from data using multivariate logistic regression. Discrimination, calibration, and clinical utility/net benefit of the developed models were assessed in the validation cohort. Receiver operating characteristic curves were constructed to measure their discriminative ability. Results: Fifty-one percent of 131 "training" patients spontaneously passed their stones. Passed stones were significantly closer to the bladder (8.6 vs 11.8 cm, p = 0.01) and smaller in length, width, and height. Two ML calculators were developed, one supervised machine learning (SML) and the other unsupervised machine learning (USML), and compared to an existing tool Multi-centre Cohort Study Evaluating the role of Inflammatory Markers In Patients Presenting with Acute Ureteric Colic (MIMIC). The SML calculator included maximum stone width (MSW), ureteral diameter above the stone (UDA), and distance from ureterovesical junction to bottom of stone and had an area under the curve (AUC) of 0.737 upon external validation of 58 "test" patients. Parameters selected by USML included MSW, UDA, and use of an anticholinergic, and it had an AUC of 0.706. The MIMIC calculator's AUC was 0.588 (0.489-0.686). Conclusion: We used AI to develop calculators that outperformed an existing tool and can help providers and patients make a better-informed decision for the treatment of ureteral stones.

🇰🇷 핵심 요약

본 연구는 10mm 이하의 요관 결석 환자를 대상으로 임상 및 영상 데이터를 활용한 머신러닝 기반의 결석 자연 배출 예측 모델을 개발하고 그 유효성을 검증하였습니다. 지도 학습(SML) 및 비지도 학습(USML) 모델은 기존의 MIMIC 도구보다 우수한 예측 성능(AUC 각각 0.737, 0.706)을 보였습니다. 따라서 본 AI 모델은 요관 결석 환자의 치료 방침 결정 시 임상적 의사결정을 보조하는 유용한 도구로 활용될 수 있습니다.

Added: 2026-04-05 07:09View ↗

7CT-based AI model for predicting therapeutic outcomes in ureteral stones after single extracorporeal shock wave lithotripsy through a cohort study.

2024-10-01International journal of surgery (London, England)DOI 10.1097/js9.0000000000001820

OBJECTIVE

Exploring the efficacy of an artificial intelligence (AI) model derived from the analysis of computed tomography (CT) images to precisely forecast the therapeutic outcomes of singular-session extracorporeal shock wave lithotripsy (ESWL) in the management of ureteral stones.

METHODS

A total of 317 patients diagnosed clinically with ureteral stones were included in this investigation. Unenhanced CT was administered to the participants within the initial fortnight preceding the inaugural ESWL. The internal cohort consisted of 250 individuals from a local healthcare facility, whereas the external cohort comprised 67 participants from another local medical institution. The proposed framework comprises three main components: an automated semantic segmentation model developed using 3D U-Net, a feature extractor that integrates radiomics and autoencoder techniques, and an ESWL efficacy prediction model trained with various machine learning algorithms. All participants underwent thorough postoperative follow-up examinations 4 weeks hence. The efficacy of ESWL was defined by the absence of stones or residual fragments measuring ≤2 mm in KUB X-ray assessments. Model stability and generalizability were judiciously validated through a fivefold cross-validation approach and a multicenter external test strategy. Moreover, Shapley Additive Explanations (SHAP) values for individual features were computed to elucidate the nuanced contributions of each feature to the model's decision-making process.

RESULTS

The semantic segmentation model the authors constructed exhibited an average Dice coefficient of 0.88±0.08 on the external testing set. ESWL classifiers built using Support Vector Machine (SVM), Random Forest (RF), XGBoost (XB), and CatBoost (CB) achieved AUROC values of 0.78, 0.84, 0.85, and 0.90, respectively, on the internal validation set. For the external testing set, SVM, RF, XB, and CB predicted ESWL with AUROC values of 0.68, 0.79, 0.80, and 0.83, respectively, with the last one being the optimal algorithm. The radiomics features and auto-encoder features made significant contributions to the decision-making process of the classification model.

CONCLUSION

This investigation unmistakably underscores the remarkable predictive prowess exhibited by a scrupulously crafted AI model using CT images to precisely anticipate the therapeutic results of a singular session of ESWL for ureteral stones.

🇰🇷 핵심 요약

본 연구는 요관 결석 환자의 1회 체외충격파쇄석술(ESWL) 성공 여부를 예측하기 위해 3D U-Net 기반의 영상 분할과 방사선학적 특징 및 오토인코더를 통합한 AI 모델을 개발하였습니다. 다기관 외부 검증 결과, CatBoost 알고리즘이 AUROC 0.83으로 가장 우수한 예측 성능을 보였으며, 이는 CT 영상 기반의 AI 모델이 ESWL 치료 예후를 정밀하게 예측하는 데 임상적으로 유용함을 시사합니다.

Added: 2026-04-05 07:09View ↗

8Management of urinary stones: state of the art and future perspectives by experts in stone disease.

2024-06-27Archivio italiano di urologia, andrologia : organo ufficiale [di] Societa italiana di ecografia urologica e nefrologicaDOI 10.4081/aiua.2024.12703

OBJECTIVE

To present state of the art on the management of urinary stones from a panel of globally recognized urolithiasis experts who met during the Experts in Stone Disease Congress in Valencia in January 2024. Options of treatment: The surgical treatment modalities of renal and ureteral stones are well defined by the guidelines of international societies, although for some index cases more alternative options are possible. For 1.5 cm renal stones, both m-PCNL and RIRS have proven to be valid treatment alternatives with comparable stone-free rates. The m-PCNL has proven to be more cost effective and requires a shorter operative time, while the RIRS has demonstrated lower morbidity in terms of blood loss and shorter recovery times. SWL has proven to be less effective at least for lower calyceal stones but has the highest safety profile. For a 6mm obstructing stone of the pelviureteric junction (PUJ) stone, SWL should be the first choice for a stone less than 1 cm, due to less invasiveness and lower risk of complications although it has a lower stone free-rate. RIRS has advantages in certain conditions such as anticoagulant treatment, obesity, or body deformity. Technical issues of the surgical procedures for stone removal: In patients receiving antithrombotic therapy, SWL, PCN and open surgery are at elevated risk of hemorrhage or perinephric hematoma. URS, is associated with less morbidity in these cases. An individualized combined evaluation of risks of bleeding and thromboembolism should determine the perioperative thromboprophylactic strategy. Pre-interventional urine culture and antibiotic therapy are mandatory although UTI treatment is becoming more challenging due to increasing resistance to routinely applied antibiotics. The use of an intrarenal urine culture and stone culture is recommended to adapt antibiotic therapy in case of postoperative infectious complications. Measurements of temperature and pressure during RIRS are vital for ensuring patient safety and optimizing surgical outcomes although techniques of measurements and methods for data analysis are still to be refined. Ureteral stents were improved by the development of new biomaterials, new coatings, and new stent designs. Topics of current research are the development of drug eluting and bioresorbable stents. Complications of endoscopic treatment: PCNL is considered the most invasive surgical option. Fever and sepsis were observed in 11 and 0.5% and need for transfusion and embolization for bleeding in 7 and 0.4%. Major complications, as colonic, splenic, liver, gall bladder and bowel injuries are quite rare but are associated with significant morbidity. Ureteroscopy causes less complications, although some of them can be severe. They depend on high pressure in the urinary tract (sepsis or renal bleeding) or application of excessive force to the urinary tract (ureteral avulsion or stricture). Diagnostic work up: Genetic testing consents the diagnosis of monogenetic conditions causing stones. It should be carried out in children and in selected adults. In adults, monogenetic diseases can be diagnosed by systematic genetic testing in no more than 4%, when cystinuria, APRT deficiency, and xanthinuria are excluded. A reliable stone analysis by infrared spectroscopy or X-ray diffraction is mandatory and should be associated to examination of the stone under a stereomicroscope. The analysis of digital images of stones by deep convolutional neural networks in dry laboratory or during endoscopic examination could allow the classification of stones based on their color and texture. Scanning electron microscopy (SEM) in association with energy dispersive spectrometry (EDS) is another fundamental research tool for the study of kidney stones. The combination of metagenomic analysis using Next Generation Sequencing (NGS) techniques and the enhanced quantitative urine culture (EQUC) protocol can be used to evaluate the urobiome of renal stone formers. Twenty-four hour urine analysis has a place during patient evaluation together with repeated measurements of urinary pH with a digital pH meter. Urinary supersaturation is the most comprehensive physicochemical risk factor employed in urolithiasis research. Urinary macromolecules can act as both promoters or inhibitors of stone formation depending on the chemical composition of urine in which they are operating. At the moment, there are no clinical applications of macromolecules in stone management or prophylaxis. Patients should be evaluated for the association with systemic pathologies. PROPHYLAXIS: Personalized medicine and public health interventions are complementary to prevent stone recurrence. Personalized medicine addresses a small part of stone patients with a high risk of recurrence and systemic complications requiring specific dietary and pharmacological treatment to prevent stone recurrence and complications of associated systemic diseases. The more numerous subjects who form one or a few stones during their entire lifespan should be treated by modifications of diet and lifestyle. Primary prevention by public health interventions is advisable to reduce prevalence of stones in the general population. Renal stone formers at "high-risk" for recurrence need early diagnosis to start specific treatment. Stone analysis allows the identification of most "high-risk" patients forming non-calcium stones: infection stones (struvite), uric acid and urates, cystine and other rare stones (dihydroxyadenine, xanthine). Patients at "high-risk" forming calcium stones require a more difficult diagnosis by clinical and laboratory evaluation. Particularly, patients with cystinuria and primary hyperoxaluria should be actively searched. FUTURE RESEARCH: Application of Artificial Intelligence are promising for automated identification of ureteral stones on CT imaging, prediction of stone composition and 24-hour urinary risk factors by demographics and clinical parameters, assessment of stone composition by evaluation of endoscopic images and prediction of outcomes of stone treatments. The synergy between urologists, nephrologists, and scientists in basic kidney stone research will enhance the depth and breadth of investigations, leading to a more comprehensive understanding of kidney stone formation.

🇰🇷 핵심 요약

본 연구는 2024년 전문가 회의를 통해 요로결석 관리의 최신 지견과 향후 전망을 정리한 것으로, 결석 크기 및 환자 상태에 따른 m-PCNL, RIRS, SWL 등 최적의 수술적 치료법과 항혈전제 복용 환자의 관리 전략을 제시하였습니다. 또한, 결석 분석의 정밀화, 유전적 검사의 활용, 인공지능을 이용한 진단 및 예후 예측 등 정밀 의료를 통한 재발 방지 전략과 향후 연구 방향을 강조하였습니다.

Added: 2026-04-05 07:09View ↗

9Identification of kidney stones in KUB X-ray images using VGG16 empowered with explainable artificial intelligence

2024Scientific Reports⭐ Q1DOI 10.1038/s41598-024-56478-4

A kidney stone is a solid formation that can lead to kidney failure, severe pain, and reduced quality of life from urinary system blockages. While medical experts can interpret kidney-ureter-bladder (KUB) X-ray images, specific images pose challenges for human detection, requiring significant analysis time. Consequently, developing a detection system becomes crucial for accurately classifying KUB X-ray images. This article applies a transfer learning (TL) model with a pre-trained VGG16 empowered with explainable artificial intelligence (XAI) to establish a system that takes KUB X-ray images and accurately categorizes them as kidney stones or normal cases. The findings demonstrate that the model achieves a testing accuracy of 97.41% in identifying kidney stones or normal KUB X-rays in the dataset used. VGG16 model delivers highly accurate predictions but lacks fairness and explainability in their decision-making process. This study incorporates the Layer-Wise Relevance Propagation (LRP) technique, an explainable artificial intelligence (XAI) technique, to enhance the transparency and effectiveness of the model to address this concern. The XAI technique, specifically LRP, increases the model's fairness and transparency, facilitating human comprehension of the predictions. Consequently, XAI can play an important role in assisting doctors with the accurate identification of kidney stones, thereby facilitating the execution of effective treatment strategies.

🇰🇷 핵심 요약

본 연구는 VGG16 전이 학습 모델을 활용하여 KUB X-ray 영상에서 신장 결석을 자동으로 분류하는 시스템을 구축하였으며, 97.41%의 높은 정확도를 달성하였습니다. 또한, Layer-Wise Relevance Propagation(LRP) 기반의 설명 가능한 인공지능(XAI) 기법을 도입하여 모델의 판단 근거를 시각화함으로써, 의료진의 진단 신뢰도와 투명성을 확보하였습니다.

Added: 2026-04-05 07:09View ↗

10Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities

2024AIMS Public Health⭐ Q1DOI 10.3934/publichealth.2024004

In recent years, machine learning (ML) and deep learning (DL) have been the leading approaches to solving various challenges, such as disease predictions, drug discovery, medical image analysis, etc., in intelligent healthcare applications. Further, given the current progress in the fields of ML and DL, there exists the promising potential for both to provide support in the realm of healthcare. This study offered an exhaustive survey on ML and DL for the healthcare system, concentrating on vital state of the art features, integration benefits, applications, prospects and future guidelines. To conduct the research, we found the most prominent journal and conference databases using distinct keywords to discover scholarly consequences. First, we furnished the most current along with cutting-edge progress in ML-DL-based analysis in smart healthcare in a compendious manner. Next, we integrated the advancement of various services for ML and DL, including ML-healthcare, DL-healthcare, and ML-DL-healthcare. We then offered ML and DL-based applications in the healthcare industry. Eventually, we emphasized the research disputes and recommendations for further studies based on our observations.

🇰🇷 핵심 요약

본 연구는 스마트 헬스케어 분야에서 질병 예측, 신약 개발, 의료 영상 분석 등을 위해 활용되는 머신러닝 및 딥러닝 기술의 최신 동향과 응용 사례를 체계적으로 고찰하였습니다. 주요 학술 데이터베이스를 바탕으로 기술적 통합의 이점과 현재의 연구 과제를 분석하였으며, 향후 스마트 헬스케어 시스템의 발전을 위한 연구 방향과 가이드라인을 제시하였습니다.

Added: 2026-04-05 07:09View ↗

🔄 Run history

Run at	Source	Hits	New	Status
2026-04-26 00:00	LitReview			completed
2026-04-19 00:00	LitReview			completed
2026-04-14 14:29	LitReview			completed
2026-04-14 14:29	LitReview			completed
2026-04-14 14:28	LitReview			error
2026-04-14 11:57	LitReview			error
2026-04-12 00:00	LitReview	1	1	completed
2026-04-05 07:09	LitReview	35	31	completed

↑ Back to top