Challenges and Opportunities Related to Data Drift Problem in Sentiment Duygu Analizinde Veri Kaymasi Problemine Dair Zorluklar ve Firsatlar


ÇETİN U., Aslantas S., Gundogmus Y. E.

8th International Conference on Computer Science and Engineering, UBMK 2023, Burdur, Turkey, 13 - 15 September 2023, pp.86-90 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/ubmk59864.2023.10286687
  • City: Burdur
  • Country: Turkey
  • Page Numbers: pp.86-90
  • Keywords: data drift, natural language processing, sentiment analysis
  • Galatasaray University Affiliated: Yes

Abstract

Transformer-based BERT Models can show great accuracy and success performance in sentiment analysis. However, these models also face some problems. Within the scope of this study, the problem of data drift in sentiment analysis is specifically examined. Data drift is caused by the fact that training data and test data having different properties. Within the scope of the study, new datasets consisting of Tweets and literary texts were created and the performances of different Bert models in these new datasets is examined. A BERT model trained on educational tweets has been shown to underperform on political/commercial tweets. It has been shown that the same BERT model trained on educational tweets can perform on literary texts just as well as a random model. For better results, we recommend using a combination of industry/domain specific small imitative BERT models.