Informasiya Texnologiyaları Institutu

Menyu ITI əməkdaslarının elmi isləri Elektron kitabxana Konfranslar İnformasiya Sistemi Qəzetlər UOT 004

ITI əməkdaşlarının elmi işləri - məqalə

Biblioqrafik təsvir
Alguliyev , R.M. Ibk -means: An iterative batch k -means algorithm for big data clustering / R.M. Alguliyev , R.M. Aliguliyev , A.M. Bagirov // Kibernetika. - 2025. - N: 4, vol 61.- P. 492-508.
Annotasiya
Information technologies such as social media, mobile computing, and the realization of the industrial Internet of Things (IoT) produce huge amounts of data every day. The development of powerful tools for knowledge-discovery is imperative to deal with such a volume of data. Clustering methods are among the most important knowledge-discovery techniques. The growth in computational power and algorithmic developments allow us to efficiently and accurately solve clustering problems in large datasets. However, these developments are insufficient to deal with clustering problems in big datasets. This is because these datasets cannot be processed as a whole due to hardware and computational restrictions. In this paper an iterative batch k-means (ibk-means) algorithm is proposed that yields good clustering results with low computation costs on big datasets. It is designed to cluster datasets using batch data. The efficiency and accuracy of the proposed algorithm are investigated depending on the size of batches, the number of attributes and clusters. The algorithm is compared with the classic k-means and mini batch k-means algorithms using computational results on several real-world datasets, all of which are available from the UCI Machine Learning Repository. The smallest dataset has 500000 data points and 2 attributes and the largest one contains 43930257 data points and 16 attributes. Results demonstrated that the ibk-means algorithm outperforms both the k- means and mini batch k-means algorithms in the sense of both efficiency and accuracy and it is applicable for the clustering of big datasets. The proposed algorithm provides real time clustering and may have direct applications in expert and intelligent systems. Furthermore, results from this paper will have a clear impact in the sense of designing more accurate and efficient clustering algorithms for big datasets taking into account available computer resources.
Elektron variant
Elektron variant