Using BERT models for breast cancer diagnosis from Turkish radiology reports

dc.contributor.authorHepsag, Pinar Uskaner
dc.contributor.authorOzel, Selma Ayse
dc.contributor.authorDalci, Kubilay
dc.contributor.authorYazici, Adnan
dc.date.accessioned2025-01-06T17:44:40Z
dc.date.available2025-01-06T17:44:40Z
dc.date.issued2024
dc.description.abstractDiagnostic radiology is concerned with obtaining images of the internal organs using radiological imaging procedures. These images are then interpreted by a diagnostic radiologist, who produces a textual report that assists in the diagnosis of illness or injury. Early detection of certain illnesses, particularly cancer, is critical, and the reports produced by diagnostic radiologists play a key role in this process. To develop models for the early detection of cancer, text classification techniques can be applied to radiological reports. However, this process requires access to a dataset of radiology reports, which is not widely available. Currently, radiology report datasets exist for high-resource languages such as English and Dutch, but not for low-resource languages such as Turkish. This article describes the collection of a mammography report dataset for Turkish, consisting of 62 reports from real patients that were manually labeled by an expert for diagnosing breast cancer. Basic machine learning models were applied to this dataset using pre-trained BERT, DistilBERT, and an ensemble learning hard voting approach. The results showed that BERT on Turkish achieved the best performance, with a 91% F1-score. Hard Voting, which combined the results of BERTTurkish, BERTClinical, and BERTMultilingual, achieved the highest F1-score of 93%. The results show that BERT and Hard Voting outperform the other machine learning models for breast cancer diagnosis from Turkish radiology reports.
dc.description.sponsorshipScientific Research Project Unit of Cukurova University [FDK-2016-6931]; Nazarbayev University (Kazakhstan) Faculty-development competitive research [FY2019-FGP-1-STEMM]
dc.description.sponsorshipThis work was supported by Scientific Research Project Unit of Cukurova University [grant number FDK-2016-6931]; and Nazarbayev University (Kazakhstan) Faculty-development competitive research [grant number FY2019-FGP-1-STEMM]
dc.identifier.doi10.1007/s10579-023-09669-w
dc.identifier.endpage1012
dc.identifier.issn1574-020X
dc.identifier.issn1574-0218
dc.identifier.issue3
dc.identifier.scopus2-s2.0-85161493134
dc.identifier.scopusqualityQ1
dc.identifier.startpage981
dc.identifier.urihttps://doi.org/10.1007/s10579-023-09669-w
dc.identifier.urihttps://hdl.handle.net/20.500.14669/3147
dc.identifier.volume58
dc.identifier.wosWOS:001004065600001
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofLanguage Resources and Evaluation
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_20241211
dc.subjectTurkish dataset
dc.subjectBreast cancer
dc.subjectContextualized word embeddings
dc.subjectRadiology reports
dc.subjectMachine learning
dc.titleUsing BERT models for breast cancer diagnosis from Turkish radiology reports
dc.typeArticle

Dosyalar