Deep learning for effective Android malware detection using API call graph embeddings


Pektas A., ACARMAN T.

SOFT COMPUTING, cilt.24, sa.2, ss.1027-1043, 2020 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 24 Sayı: 2
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1007/s00500-019-03940-5
  • Dergi Adı: SOFT COMPUTING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Sayfa Sayıları: ss.1027-1043
  • Anahtar Kelimeler: Android malware, Deep learning, Graph embedding, Hyper-parameter tuning, API call graph
  • Galatasaray Üniversitesi Adresli: Evet

Özet

High penetration of Android applications along with their malicious variants requires efficient and effective malware detection methods to build mobile platform security. API call sequence derived from API call graph structure can be used to model application behavior accurately. Behaviors are extracted by following the API call graph, its branching, and order of calls. But identification of similarities in graphs and graph matching algorithms for classification is slow, complicated to be adopted to a new domain, and their results may be inaccurate. In this study, the authors use the API call graph as a graph representation of all possible execution paths that a malware can track during its runtime. The embedding of API call graphs transformed into a low dimension numeric vector feature set is introduced to the deep neural network. Then, similarity detection for each binary function is trained and tested effectively. This study is also focused on maximizing the performance of the network by evaluating different embedding algorithms and tuning various network configuration parameters to assure the best combination of the hyper-parameters and to reach at the highest statistical metric value. Experimental results show that the presented malware classification is reached at 98.86% level in accuracy, 98.65% in F-measure, 98.47% in recall and 98.84% in precision, respectively.