Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish

dc.authoridTulu, Cagatay Neftali/0000-0002-4462-3707
dc.contributor.authorTulu, Cagatay Neftali
dc.date.accessioned2025-01-06T17:44:19Z
dc.date.available2025-01-06T17:44:19Z
dc.date.issued2022
dc.description.abstractThis study aims to evaluate experimentally the word vectors produced by three widely used embedding methods for the word-level semantic text similarity in Turkish. Three benchmark datasets SimTurk, AnlamVer, and RG65_Turkce are used in this study to evaluate the word embedding vectors produced by three different methods namely Word2Vec, Glove, and FastText. As a result of the comparative analysis, Turkish word vectors produced with Glove and FastText gained better correlation in the word level semantic similarity. It is also found that The Turkish word coverage of FastText is ahead of the other two methods because the limited number of Out of Vocabulary (OOV) words have been observed in the experiments conducted for FastText. Another observation is that FastText and Glove vectors showed great success in terms of Spearman correlation value in the SimTurk and AnlamVer datasets both of which are purely prepared and evaluated by local Turkish individuals. This is another indicator showing that these aforementioned datasets are better representing the Turkish language in terms of morphology and inflections.
dc.identifier.doi10.12913/22998624/152453
dc.identifier.endpage156
dc.identifier.issn2080-4075
dc.identifier.issn2299-8624
dc.identifier.issue4
dc.identifier.scopus2-s2.0-85137902877
dc.identifier.scopusqualityQ3
dc.identifier.startpage147
dc.identifier.urihttps://doi.org/10.12913/22998624/152453
dc.identifier.urihttps://hdl.handle.net/20.500.14669/3002
dc.identifier.volume16
dc.identifier.wosWOS:000890170500002
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherLublin Univ Technology, Poland
dc.relation.ispartofAdvances in Science and Technology-Research Journal
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20241211
dc.subjectsemantic word similarity
dc.subjectword embeddings
dc.subjectTurkish NLP
dc.titleExperimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish
dc.typeArticle

Dosyalar