6th International Conference on Man-Machine Interactions, ICMMI 2019, Cracow, Polonya, 2 - 03 Ekim 2019, cilt.1061, ss.125-135
In this paper, we present the Named Entity Recognition as a multi-class system and we evaluate baseline classifiers along with the technical features extracted from tweet datasets. Initially, we elaborate the conversion procedure of tweet data and we study three different datasets such that raw tweet data are compatible to the presented data model. The first dataset is well-known for benchmarking purposes and the other two datasets have been collected in the wild by using Twitter search API and given keywords. Then, we elaborate the feature vector constituted by 9 technical features. To reach at higher statistical metric values of the multi-class NER system, we seek the performance of the classifier subject to different combination of features. Finally, we elaborate the impact of the cosine similarity to the class centroid feature to the performance of the classifiers and we present the highest F1 score reached by using a particular set of features.