A K-Means Algorithm Application on Big Data

Eren B., Karabulut E. C., ALPTEKİN S. E., Alptekin G. I.

World Congress on Engineering and Computer Science, San-Francisco, Costa Rica, 21 - 23 October 2015, pp.814-818 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • City: San-Francisco
  • Country: Costa Rica
  • Page Numbers: pp.814-818
  • Keywords: Big data, data mining, clustering, K-means algorithm
  • Galatasaray University Affiliated: Yes


As more and more data is becoming available due to advances in information and communication technologies, gaining knowledge and insights from this data is replacing experience and intuition based decision making in organizations. Big data mining can be defined as the capability of extracting useful information from massive and complex datasets or data streams. In this paper, one of the commonly used data mining algorithm, K-means, is used to extract information from a big dataset. Doing so, MapReduce framework with Hadoop is used. As the dataset, the results of the social evolution experiment of MIT Human Dynamics Lab are used. The aim is to derive meaningful relationships between students' eating habits and the tendency of getting cold.