A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets


Taspinar M., GANİZ M. C., ACARMAN T.

22nd International Conference on Applications of Natural Language to Information Systems (NLDB), Liege, Belgium, 21 - 23 June 2017, vol.10260, pp.254-259 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 10260
  • Doi Number: 10.1007/978-3-319-59569-6_30
  • City: Liege
  • Country: Belgium
  • Page Numbers: pp.254-259
  • Galatasaray University Affiliated: Yes

Abstract

Named Entity Recognition (NER) is a well-studied domain in Natural Language Processing. Traditional NER systems, such as Stanford NER system, achieve high performance with formal and grammatically well-structured texts. However, when these systems are applied to informal and noisy texts, which have mixed language with emoticons or abbreviations, there is a significant degradation in results. We attempt to fill this gap by developing a NER system with using novel term features including Word2vec based features and machine learning based classifier. We describe the features and Word2Vec implementation used in our solution and report the results obtained by our system. The system is quite efficient and scalable in terms of classification time complexity and shows promising results which can be potentially improved with larger training sets or with the use of semi-supervised classifiers.