A Comparative Approach for Multiclass Text Analysis

Franko S., PARLAK İ. B.

6th International Symposium on Digital Forensic and Security (ISDFS), Antalya, Türkiye, 22 - 25 Mart 2018, ss.61-66, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/isdfs.2018.8355325
Basıldığı Şehir: Antalya
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.61-66
Anahtar Kelimeler: information model, language representation, naive bayes, maximum entropy
Galatasaray Üniversitesi Adresli: Evet

Özet

This paper presents multiclass text analysis for the classification problem in Spanish documents. Even if Spanish language is considered as one the most spoken language, text classification problem has not yet been carried out for different problems in multiclass analysis. Two different approaches; Naive Bayes and Maximum Entropy were used as machine learning techniques. The corpus was created with 10 different categories. Smoothing parameters and three different document models were integrated to the study. During the comparative analysis, optimal parameters were determined using their sensitivity on the accuracy, the precision and the recall. Consequently, Maximum Entropy was found as the best technique even if both techniques were relevant in multiclass classification.