A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets


Taspinar M., GANİZ M. C. , ACARMAN T.

22nd International Conference on Applications of Natural Language to Information Systems (NLDB), Liege, Belçika, 21 - 23 Haziran 2017, cilt.10260, ss.254-259 identifier identifier

  • Cilt numarası: 10260
  • Doi Numarası: 10.1007/978-3-319-59569-6_30
  • Basıldığı Şehir: Liege
  • Basıldığı Ülke: Belçika
  • Sayfa Sayıları: ss.254-259

Özet

Named Entity Recognition (NER) is a well-studied domain in Natural Language Processing. Traditional NER systems, such as Stanford NER system, achieve high performance with formal and grammatically well-structured texts. However, when these systems are applied to informal and noisy texts, which have mixed language with emoticons or abbreviations, there is a significant degradation in results. We attempt to fill this gap by developing a NER system with using novel term features including Word2vec based features and machine learning based classifier. We describe the features and Word2Vec implementation used in our solution and report the results obtained by our system. The system is quite efficient and scalable in terms of classification time complexity and shows promising results which can be potentially improved with larger training sets or with the use of semi-supervised classifiers.