Proposal of n-gram Based Algorithm for Malware Classification

Pektas A., Eris M., ACARMAN T.

5th International Conference on Emerging Security Information, Systems and Technologies (SECURWARE), Nice, Fransa, 21 - 27 Ağustos 2011, ss.14-18 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Nice
  • Basıldığı Ülke: Fransa
  • Sayfa Sayıları: ss.14-18


Obfuscation techniques degrade the n-gram features of binary form of the malware. In this study, methodology to classify malware instances by using n-gram features of its disassembled code is presented. The presented statistical method uses the n-gram features of the malware to classify its instance with respect to their families. n-gram is a fixed size sliding window of byte array, where n is the size of the window. The contribution of the presented method is capability of using only one vector to represent malware subfamily which is called subfamily centroid. Using only one vector for classification simply reduces the dimension of the n-gram space. Experimental results are performed over a fairly large data set, which is being collected through Computer Emergency Response Team (CERT) activities in the National Research Institute of Electronics and Cryptology, to illustrate the effectiveness of the proposed malware classification methodology.