A Comparative Approach for Multiclass Text Analysis


Franko S., PARLAK İ. B.

6th International Symposium on Digital Forensic and Security (ISDFS), Antalya, Türkiye, 22 - 25 Mart 2018, ss.61-66 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/isdfs.2018.8355325
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.61-66
  • Anahtar Kelimeler: information model, language representation, naive bayes, maximum entropy
  • Galatasaray Üniversitesi Adresli: Evet

Özet

This paper presents multiclass text analysis for the classification problem in Spanish documents. Even if Spanish language is considered as one the most spoken language, text classification problem has not yet been carried out for different problems in multiclass analysis. Two different approaches; Naive Bayes and Maximum Entropy were used as machine learning techniques. The corpus was created with 10 different categories. Smoothing parameters and three different document models were integrated to the study. During the comparative analysis, optimal parameters were determined using their sensitivity on the accuracy, the precision and the recall. Consequently, Maximum Entropy was found as the best technique even if both techniques were relevant in multiclass classification.