Deep Multirepresentation Learning for Data Clustering

IEEE Trans Neural Netw Learn Syst. 2023 Jul 7:PP. doi: 10.1109/TNNLS.2023.3289158. Online ahead of print.

Abstract

Deep clustering incorporates embedding into clustering in order to find a lower-dimensional space suitable for clustering tasks. Conventional deep clustering methods aim to obtain a single global embedding subspace (aka latent space) for all the data clusters. In contrast, in this article, we propose a deep multirepresentation learning (DML) framework for data clustering whereby each difficult-to-cluster data group is associated with its own distinct optimized latent space and all the easy-to-cluster data groups are associated with a general common latent space. Autoencoders (AEs) are employed for generating cluster-specific and general latent spaces. To specialize each AE in its associated data cluster(s), we propose a novel and effective loss function which consists of weighted reconstruction and clustering losses of the data points, where higher weights are assigned to the samples more probable to belong to the corresponding cluster(s). Experimental results on benchmark datasets demonstrate that the proposed DML framework and loss function outperform state-of-the-art clustering approaches. In addition, the results show that the DML method significantly outperforms the SOTA on imbalanced datasets as a result of assigning an individual latent space to the difficult clusters.