Similarity Detection between Turkish Text Documents with Distance Metrics

dc.authoridOzel, Selma Ayse/0000-0001-9201-6349
dc.contributor.authorKaya Keles, Mumine
dc.contributor.authorOzel, Selma Ayse
dc.date.accessioned2025-01-06T17:37:30Z
dc.date.available2025-01-06T17:37:30Z
dc.date.issued2017
dc.description2017 International Conference on Computer Science and Engineering (UBMK) -- OCT 05-08, 2017 -- Antalya, TURKEY
dc.description.abstractThe aim of this study is to compare the successes of various distance metrics and to determine the most appropriate methods in order to detect similarities among textual documents written in Turkish. Computing similarities between text documents is the basic step of plagiarism detection, and text mining methods like author detection, text classification and clustering. Therefore, plagiarism detection and text mining applications will be more successful by using the distance metrics that are determined according to the results obtained in this study. For this purpose, chunks of texts in different lengths are selected as the experimental dataset in this study. After that, preprocessing methods are applied to the dataset that is used; therefore new and different experimental scenarios are created by removing stopwords and Turkish characters, and stemming words with Zemberek. According to the experimental results, it is observed that the preprocessing phase increases the accuracy of similarity detection. Especially, stemming using Zemberek increases the success rate. In all cases, the Cosine Similarity method has been observed as more successful than other distance metrics, because of producing more realistic results.
dc.description.sponsorshipIEEE Adv Technol Human,Istanbul Teknik Univ,Gazi Univ,Atilim Univ,TBV,Akdeniz Univ,Tmmob Bilgisayar Muhendisleri Odasi
dc.identifier.endpage321
dc.identifier.isbn978-1-5386-0930-9
dc.identifier.startpage316
dc.identifier.urihttps://hdl.handle.net/20.500.14669/2248
dc.identifier.wosWOS:000426856900059
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.language.isotr
dc.publisherIEEE
dc.relation.ispartof2017 International Conference on Computer Science and Engineering (Ubmk)
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_20241211
dc.subjectDocument similarity
dc.subjectTurkish texts
dc.subjectDistance metrics
dc.subjectZemberek
dc.subjectCosine similarity
dc.titleSimilarity Detection between Turkish Text Documents with Distance Metrics
dc.typeConference Object

Dosyalar