Evaluation of Cosine Similarity Feature for Named Entity Recognition on Tweets


Büyüktopaç O., ACARMAN T.

6th International Conference on Man-Machine Interactions, ICMMI 2019, Cracow, Polonya, 2 - 03 Ekim 2019, cilt.1061, ss.125-135 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 1061
  • Doi Numarası: 10.1007/978-3-030-31964-9_12
  • Basıldığı Şehir: Cracow
  • Basıldığı Ülke: Polonya
  • Sayfa Sayıları: ss.125-135
  • Anahtar Kelimeler: Classification, Cosine similarity, Information extraction, Named entity recognition, Twitter
  • Galatasaray Üniversitesi Adresli: Evet

Özet

In this paper, we present the Named Entity Recognition as a multi-class system and we evaluate baseline classifiers along with the technical features extracted from tweet datasets. Initially, we elaborate the conversion procedure of tweet data and we study three different datasets such that raw tweet data are compatible to the presented data model. The first dataset is well-known for benchmarking purposes and the other two datasets have been collected in the wild by using Twitter search API and given keywords. Then, we elaborate the feature vector constituted by 9 technical features. To reach at higher statistical metric values of the multi-class NER system, we seek the performance of the classifier subject to different combination of features. Finally, we elaborate the impact of the cosine similarity to the class centroid feature to the performance of the classifiers and we present the highest F1 score reached by using a particular set of features.