6th International Conference on Intelligent Data Science Technologies and Applications, IDSTA 2025, Varna, Bulgaristan, 1 - 04 Eylül 2025, ss.117-124, (Tam Metin Bildiri)
Traditional company clustering approaches relying on manually curated taxonomies often fail to capture true business operations, particularly for innovative companies in rapidly evolving markets. This study presents an automated, semantically-aware approach for clustering companies based on their actual business operations. The analysis is based on a dataset primarily composed of Turkish and multinational companies, curated from the corporate listings of Kariyer.net. Our methodology employs a multi-agent artificial intelligence (AI) system to enrich company datasets with product descriptions through web-based information retrieval, then generates semantic embeddings using transformer models, followed by dimensionality reduction and clustering analysis. Through systematic evaluation of 14 configurations across different clustering algorithms, distance metrics, and dimensionality reduction techniques, we demonstrate that Density-Based Spatial Clustering of Applications with Noise (DBSCAN) with Euclidean distance on t-distributed Stochastic Neighbor Embedding (t-SNE)-reduced 2D embeddings achieves optimal performance. The resulting clusters exhibit strong intra-sector cohesion and meaningful inter-sector relationships, validated through multiple clustering metrics. Our approach produces semantically coherent industry groupings with practical effectiveness for business intelligence applications including salary benchmarking and market segmentation.