Quantification of Overlapping and Network Complexity in News: Assessment of Top2Vec and Fuzzy Topic Models

PARLAK, İSMAİL; ŞAHİN, MUSA; ACARMAN, TANKUT; Adel, Mouloud; Bourennane, Salah

doi:10.3390/app15179627

Quantification of Overlapping and Network Complexity in News: Assessment of Top2Vec and Fuzzy Topic Models

PARLAK İ. B., ŞAHİN M. Ş., ACARMAN T., Adel M., Bourennane S.

Applied Sciences (Switzerland), cilt.15, sa.17, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 15 Sayı: 17
Basım Tarihi: 2025
Doi Numarası: 10.3390/app15179627
Dergi Adı: Applied Sciences (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
Anahtar Kelimeler: artificial intelligence, knowledge graphs, machine learning, natural language processing, text mining, topic modeling
Galatasaray Üniversitesi Adresli: Evet

Özet

Topic modeling in digital news faces the dual challenge of thematic overlap and evolving semantic boundaries, especially in morphologically rich languages like Turkish. To address these obstacles, we propose a topic modeling framework enhanced with knowledge graphs that explicitly incorporates uncertainty in topic assignment. We focus on the diversity of Fuzzy Latent Semantic Analysis (FLSA) and compare the performance with Latent Dirichlet Allocation (LDA), BERTopic, and embedding-based Top2Vec on a corpus drawn from two Turkish news agencies. We evaluate each model using standard metrics for topic coherence, diversity, and interpretability. We propose Shannon entropy of node-degree distributions to measure the network complexity of knowledge graphs as topic similarity. Our results indicate that FLSA achieves perfect topic diversity, 1.000 and improved interpretability, 0.33 over LDA, 0.09 while also enhancing coherence, 0.33 vs. 0.27. Top2Vec demonstrates the strongest coherence, 0.81 and interpretability, 0.78 with high diversity, 0.97, reflecting its capacity to form semantically cohesive clusters. Entropy analysis further shows that FLSA produces the most information-rich topic networks. These findings suggest that fuzzy modeling and embedding-based approaches offer complementary strengths, uncertainty-aware flexibility, and semantic precision, thereby improving topic discovery in complex, unstructured news environments.