Nonparametric multi-assignment clustering

doi:10.3233/IDA-160105

Full metadata record

DC Field	Value	Language
dc.contributor.author	Liu, Chien-Liang	en_US
dc.contributor.author	Hsaio, Wen-Hoar	en_US
dc.contributor.author	Chang, Tao-Hsing	en_US
dc.contributor.author	Jou, Tzai-Min	en_US
dc.date.accessioned	2018-08-21T05:52:45Z	-
dc.date.available	2018-08-21T05:52:45Z	-
dc.date.issued	2017-01-01	en_US
dc.identifier.issn	1088-467X	en_US
dc.identifier.uri	http://dx.doi.org/10.3233/IDA-160105	en_US
dc.identifier.uri	http://hdl.handle.net/11536/143923	-
dc.description.abstract	Multi-label learning has attracted significant attention from machine learning and data mining over the last decade. Although many multi-label classification algorithms have been devised, few research studies focus on multi-assignment clustering (MAC), in which a data instance can be assigned to multiple clusters. The MAC problem is practical in many application domains, such as document clustering, customer segmentation and image clustering. Additionally, specifying the number of clusters is always a difficult but critical problem for a certain class of clustering algorithms. Hence, this work proposes a non-parametric multi-assignment clustering algorithm called multi-assignment Chinese restaurant process (MACRP), which allows the model complexity to grow as more data instances are observed. The proposed algorithm determines the number of clusters from data, so it provides a practical model to process massive data sets. In the proposed algorithm, we devise a novel prior distribution based on the similarity graph to achieve the goal of multi-assignment, and propose a Gibbs sampling algorithm to carry out posterior inference. The implementation in this work uses collapsed Gibbs sampling and compares with several methods. Additionally, previous evaluation metrics used by multi-label classification are inappropriate for MAC, since label information is unavailable. This work further devises an evaluation metric for MAC based on the characteristics of clustering and multi-assignment problems. We conduct experiments on two real data sets, and the experimental results indicate that the proposed method is competitive and outperforms the alternatives on most data sets.	en_US
dc.language.iso	en_US	en_US
dc.subject	Multi-assignment clustering	en_US
dc.subject	Chinese restaurant process (CRP)	en_US
dc.subject	Non-parametric Bayesian	en_US
dc.title	Nonparametric multi-assignment clustering	en_US
dc.type	Article	en_US
dc.identifier.doi	10.3233/IDA-160105	en_US
dc.identifier.journal	INTELLIGENT DATA ANALYSIS	en_US
dc.citation.volume	21	en_US
dc.citation.spage	893	en_US
dc.citation.epage	911	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	工業工程與管理學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.contributor.department	Department of Industrial Engineering and Management	en_US
dc.identifier.wosnumber	WOS:000412919200008	en_US
Appears in Collections:	Articles