Similarity detection between Turkish text documents with distance metrics

dc.contributor.authorKaya, Mümine Keleş
dc.contributor.authorÖzel, Selma Ayşe
dc.date.accessioned2025-01-06T17:29:47Z
dc.date.available2025-01-06T17:29:47Z
dc.date.issued2017
dc.description2nd International Conference on Computer Science and Engineering, UBMK 2017 -- 5 October 2017 through 8 October 2017 -- Antalya -- 132116
dc.description.abstractThe aim of this study is to compare the successes of various distance metrics and to determine the most appropriate methods in order to detect similarities among textual documents written in Turkish. Computing similarities between text documents is the basic step of plagiarism detection, and text mining methods like author detection, text classification and clustering. Therefore, plagiarism detection and text mining applications will be more successful by using the distance metrics that are determined according to the results obtained in this study. For this purpose, chunks of texts in different lengths are selected as the experimental dataset in this study. After that, preprocessing methods are applied to the dataset that is used; therefore new and different experimental scenarios are created by removing stopwords and Turkish characters, and stemming words with Zemberek. According to the experimental results, it is observed that the preprocessing phase increases the accuracy of similarity detection. Especially, stemming using Zemberek increases the success rate. In all cases, the Cosine Similarity method has been observed as more successful than other distance metrics, because of producing more realistic results. © 2017 IEEE.
dc.identifier.doi10.1109/UBMK.2017.8093399
dc.identifier.endpage321
dc.identifier.isbn978-153860930-9
dc.identifier.scopus2-s2.0-85040604751
dc.identifier.startpage316
dc.identifier.urihttps://doi.org/10.1109/UBMK.2017.8093399
dc.identifier.urihttps://hdl.handle.net/20.500.14669/1351
dc.indekslendigikaynakScopus
dc.language.isotr
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof2nd International Conference on Computer Science and Engineering, UBMK 2017
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_20241211
dc.subjectCosine Similarity
dc.subjectDistance metrics
dc.subjectDocument similarity
dc.subjectTurkish texts
dc.subjectZemberek
dc.titleSimilarity detection between Turkish text documents with distance metrics
dc.title.alternativeUzaklik Ölçütleri ile Türkçe Metin Belgeleri Arasindaki Benzerli?in Belirlenmesi
dc.typeConference Object

Dosyalar