Deep learning for effective Android malware detection using API call graph embeddings

Pektas, Abdurrahman; ACARMAN, TANKUT

doi:10.1007/s00500-019-03940-5

Deep learning for effective Android malware detection using API call graph embeddings

Atıf İçin Kopyala

Pektas A., ACARMAN T.

SOFT COMPUTING, cilt.24, sa.2, ss.1027-1043, 2020 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 24 Sayı: 2
Basım Tarihi: 2020
Doi Numarası: 10.1007/s00500-019-03940-5
Dergi Adı: SOFT COMPUTING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
Sayfa Sayıları: ss.1027-1043
Anahtar Kelimeler: Android malware, Deep learning, Graph embedding, Hyper-parameter tuning, API call graph
Galatasaray Üniversitesi Adresli: Evet

Özet

High penetration of Android applications along with their malicious variants requires efficient and effective malware detection methods to build mobile platform security. API call sequence derived from API call graph structure can be used to model application behavior accurately. Behaviors are extracted by following the API call graph, its branching, and order of calls. But identification of similarities in graphs and graph matching algorithms for classification is slow, complicated to be adopted to a new domain, and their results may be inaccurate. In this study, the authors use the API call graph as a graph representation of all possible execution paths that a malware can track during its runtime. The embedding of API call graphs transformed into a low dimension numeric vector feature set is introduced to the deep neural network. Then, similarity detection for each binary function is trained and tested effectively. This study is also focused on maximizing the performance of the network by evaluating different embedding algorithms and tuning various network configuration parameters to assure the best combination of the hyper-parameters and to reach at the highest statistical metric value. Experimental results show that the presented malware classification is reached at 98.86% level in accuracy, 98.65% in F-measure, 98.47% in recall and 98.84% in precision, respectively.