Applying Natural Language Processing for detecting malicious patterns in Android applications

[ X ]

Tarih

2021

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Elsevier Sci Ltd

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

With increasing quantity and sophistication, malicious code is becoming difficult to discover and analyze. Modern NLP (Natural Language Processing) techniques have significantly improved, and are being used in practice to accomplish various tasks. Recently, many research works have applied NLP for finding ma-licious patterns in Android and Windows apps. In this paper, we exploit this fact and apply NLP tech-niques to an intermediate representation (MAIL e Malware analysis intermediate language) of Android apps to build a similarity index model, named SIMP. We use SIMP to find malicious patterns in Android apps. MAIL provides control flow patterns to enhance the malware analysis and makes the code accessible to NLP techniques for checking semantic similarities. For applying NLP, we consider a MAIL program as one document. The control flow patterns in this program when divided, into specific blocks (words), become sentences. We apply TFIDF and Bag-of-Words over these control flow patterns to build SIMP. Our proposed model, when tested with real malware and benign Android apps using different validation methods, achieved an MCC (Mathews Correlation Coefficient) > 0.94 between the true and predicted values. That indicates, predicting a new sample either as malware or benign with a high success rate. (c) 2021 Elsevier Ltd. All rights reserved.

Açıklama

Anahtar Kelimeler

Natural language processing, Android applications, Control flow patterns, Intermediate language, Malicious patterns

Kaynak

Forensic Science International-Digital Investigation

WoS Q Değeri

Q4

Scopus Q Değeri

Q1

Cilt

39

Sayı

Künye