Quantification of Overlapping and Network Complexity in News: Assessment of Top2Vec and Fuzzy Topic Models


PARLAK İ. B., ŞAHİN M. Ş., ACARMAN T., Adel M., Bourennane S.

Applied Sciences (Switzerland), vol.15, no.17, 2025 (SCI-Expanded, Scopus) identifier

  • Publication Type: Article / Article
  • Volume: 15 Issue: 17
  • Publication Date: 2025
  • Doi Number: 10.3390/app15179627
  • Journal Name: Applied Sciences (Switzerland)
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
  • Keywords: artificial intelligence, knowledge graphs, machine learning, natural language processing, text mining, topic modeling
  • Galatasaray University Affiliated: Yes

Abstract

Topic modeling in digital news faces the dual challenge of thematic overlap and evolving semantic boundaries, especially in morphologically rich languages like Turkish. To address these obstacles, we propose a topic modeling framework enhanced with knowledge graphs that explicitly incorporates uncertainty in topic assignment. We focus on the diversity of Fuzzy Latent Semantic Analysis (FLSA) and compare the performance with Latent Dirichlet Allocation (LDA), BERTopic, and embedding-based Top2Vec on a corpus drawn from two Turkish news agencies. We evaluate each model using standard metrics for topic coherence, diversity, and interpretability. We propose Shannon entropy of node-degree distributions to measure the network complexity of knowledge graphs as topic similarity. Our results indicate that FLSA achieves perfect topic diversity, 1.000 and improved interpretability, 0.33 over LDA, 0.09 while also enhancing coherence, 0.33 vs. 0.27. Top2Vec demonstrates the strongest coherence, 0.81 and interpretability, 0.78 with high diversity, 0.97, reflecting its capacity to form semantically cohesive clusters. Entropy analysis further shows that FLSA produces the most information-rich topic networks. These findings suggest that fuzzy modeling and embedding-based approaches offer complementary strengths, uncertainty-aware flexibility, and semantic precision, thereby improving topic discovery in complex, unstructured news environments.