Classification of cohesin family using class specific motifs

Eser E. M., ARSLAN R. B., Sezerman U. O.

2013 8th International Symposium on Health Informatics and Bioinformatics, HIBIT 2013, Ankara, Türkiye, 25 - 27 Eylül 2013, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/hibit.2013.6661687
Basıldığı Şehir: Ankara
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: Class-specific motifs, Cohesin, N-gram, Protein Classification, Reduced amino acid alphabets
Galatasaray Üniversitesi Adresli: Evet

Özet

Motif extraction from protein sequences has been a challenging task for bioinformaticians. Class-specific motifs, which are frequently found in one class but are in small ratio in other classes can be used for highly accurate classification of protein sequences. In this study, we present a new scoring based method for class-specific n-gram motif selection using reduced amino acid alphabets. Cohesin protein sequences, which interact with Dockerin modules to construct the most common and abundant organic polymer Cellulosome is used for class specific motif selection, and selected motifs are then given to J48 and SVM algorithms as features. Results of classification are examined with parameters of various n-gram sizes, reduced amino acid alphabets and feature number. Result with training accuracy of 98.61 % and test accuracy of 94.54 %, was found to be best one using Gbmr14 alphabet, 5 features per family, 4-gram motifs and J48 algorithm. The proposed technique can be generalized to use for other protein families. © 2013 IEEE.