Challenges and Opportunities Related to Data Drift Problem in Sentiment Duygu Analizinde Veri Kaymasi Problemine Dair Zorluklar ve Firsatlar


ÇETİN U., Aslantas S., Gundogmus Y. E.

8th International Conference on Computer Science and Engineering, UBMK 2023, Burdur, Türkiye, 13 - 15 Eylül 2023, ss.86-90 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk59864.2023.10286687
  • Basıldığı Şehir: Burdur
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.86-90
  • Anahtar Kelimeler: data drift, natural language processing, sentiment analysis
  • Galatasaray Üniversitesi Adresli: Evet

Özet

Transformer-based BERT Models can show great accuracy and success performance in sentiment analysis. However, these models also face some problems. Within the scope of this study, the problem of data drift in sentiment analysis is specifically examined. Data drift is caused by the fact that training data and test data having different properties. Within the scope of the study, new datasets consisting of Tweets and literary texts were created and the performances of different Bert models in these new datasets is examined. A BERT model trained on educational tweets has been shown to underperform on political/commercial tweets. It has been shown that the same BERT model trained on educational tweets can perform on literary texts just as well as a random model. For better results, we recommend using a combination of industry/domain specific small imitative BERT models.