Clustering of recreational divers by their health conditions in a database of a citizen science project

ÖZYİĞİT T. , Yavuz C., EGİ S. M. , Pieri M., Balestra C., Marroni A.

Undersea and Hyperbaric Medicine, vol.46, no.2, pp.171-183, 2019 (Journal Indexed in SCI Expanded) identifier identifier

  • Publication Type: Article / Abstract
  • Volume: 46 Issue: 2
  • Publication Date: 2019
  • Title of Journal : Undersea and Hyperbaric Medicine
  • Page Numbers: pp.171-183


Copyright © 2019 Undersea & Hyperbaric Medical Society, Inc.Divers Alert Network Europe has created a database with a large amount of dive-related data that has been collected since 1993 within the scope of the Diving Safety Laboratory citizen science project. The main objectives of this study are the grouping divers by their health information and revealing significant differences in diving parameters using data mining techniques. Due to the methodology of the project, data cleaning was performed before applying clustering methods in order to eliminate potential mistakes resulting from inaccuracies and missing information. Despite the fact that 63% of the data were lost during the cleaning phase, the remaining 1,169 “clean” diver data enabled meaningful clustering using the “two-step” method. Experienced male divers without any health problems are in Cluster 1. Male and female divers with health problems and high rates of cigarette smoking are in Cluster 2; healthy, overweight divers are in Cluster 3. There are significant differences in terms of dive parameters including pre- and post-dive conditions with respect to each group, such as: exercise level, alcohol consumption, thermal comfort, equipment malfunctions, and maximum depth. The study proves the usefulness of citizen science projects, while data collection methodologies can be improved to decrease potential mistakes resulting from inconsistencies, inaccuracies and missing information. It is hypothesized that if naturally occurring clusters of divers were identified it might be possible to identify risk factors arising from different clusters while merging the database with other dive accident databases in the future.