Leveraging Graph Databases for Enhanced Healthcare Data Management: A Performance Comparison Study

Creative Commons License

Turhan S. N.

IEEE Big Data , Naples, Italy, 15 - 18 December 2023, pp.5007-5013

  • Publication Type: Conference Paper / Full Text
  • City: Naples
  • Country: Italy
  • Page Numbers: pp.5007-5013
  • Galatasaray University Affiliated: Yes


Health data plays a pivotal role in modern healthcare, guiding patient care, diagnoses, treatments, and outcomes. This extensive data repository encompasses electronic health records, medical imaging, test reports, and administrative information, empowering healthcare practitioners and researchers to make evidence-based decisions to improve patient well-being. In the complex healthcare landscape, handling health data presents challenges. While relational databases have historically dominated many industries, including healthcare, innovative alternatives like graph databases are gaining favor. Due to its complex and interconnected nature, healthcare data often losessemantic data integrity when modeled in relational databases. In contrast, graph databases have shown remarkable performance with interconnected data. Consequently, there is a belief that modeling health data as a whole on a graph database would produce excellent results. This preliminary study investigates how graph databases can efficiently manage health data by comparing simple data modeling and query performance. The research utilizes a dataset that is publicly available from a hospital in the United States. The dataset covers multiple areas, including hospital admissions, diagnoses, laboratory results, and prescription information for patients diagnosed with diabetes. Initially, an Entity-Relationship Diagram (ERD) models this two-dimensional tabular dataset and is built on a relational database. Subsequently, the ERD is transformed into a graph database schema and built on a NoSQL graph database system. Both databases are normalized during the modeling process, and they share identical data to ensure consistency in data entry. Following this, varying degrees of complex queries are constructed and enacted using the query languages of both database management systems. The primary results indicate that Neo4j outperforms PostgreSQL in performance, though slight inconsistencies in data entry were noted. It highlights their potential in enhancing healthcare data management for better patient care and outcomes.