Cite Article

Soft Set Multivariate Distribution for Categorical Data Clustering

Choose citation format

BibTeX

@article{IJASEIT15420,
   author = {Iwan Tri Riyadi Yanto and Rohmat Saedudin and Sely Novita Sari and Mustafa Mat Deris and Norhalina Senan},
   title = {Soft Set Multivariate Distribution for Categorical Data Clustering},
   journal = {International Journal on Advanced Science, Engineering and Information Technology},
   volume = {11},
   number = {5},
   year = {2021},
   pages = {1841--1846},
   keywords = {Clustering; categorical data; soft set; multivariate.},
   abstract = {

Clustering is the process of breaking down a huge dataset into smaller groups. It has been used in some field studies including pattern recognition, segmentation, and statistics with remarkable success. Clustering is a technique for dividing multivariate datasets into groups. No inherent distance measure on data category makes clustering data more challenging than numerical data. Data category can be assumed following the data from a multinomial distribution. Thus, the standard model parametric model can be used in latent class clustering based on the independent product of multinomial distributions. Meanwhile, multi-valued attributes on the categorical data can be decomposed into the standard set on a multi soft set. In this paper, a clustering technique based on soft set theory is proposed for categorical data through a multinomial distribution. The data will be represented as a multi soft set which is every soft set has its probability of being a member of the cluster. The data with the highest probability will be assigned as the member of the cluster. The experiment of the proposed technique is evaluated based on the Dunn index with regard to the number of clusters and response time. The experiment results show that the proposed technique has the lowest response time with high stability compared to baseline techniques. This study recommends a maximum number of clusters in implementation on the real data. 

},    issn = {2088-5334},    publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development},    url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=15420},    doi = {10.18517/ijaseit.11.5.15420} }

EndNote

%A Yanto, Iwan Tri Riyadi
%A Saedudin, Rohmat
%A Sari, Sely Novita
%A Deris, Mustafa Mat
%A Senan, Norhalina
%D 2021
%T Soft Set Multivariate Distribution for Categorical Data Clustering
%B 2021
%9 Clustering; categorical data; soft set; multivariate.
%! Soft Set Multivariate Distribution for Categorical Data Clustering
%K Clustering; categorical data; soft set; multivariate.
%X 

Clustering is the process of breaking down a huge dataset into smaller groups. It has been used in some field studies including pattern recognition, segmentation, and statistics with remarkable success. Clustering is a technique for dividing multivariate datasets into groups. No inherent distance measure on data category makes clustering data more challenging than numerical data. Data category can be assumed following the data from a multinomial distribution. Thus, the standard model parametric model can be used in latent class clustering based on the independent product of multinomial distributions. Meanwhile, multi-valued attributes on the categorical data can be decomposed into the standard set on a multi soft set. In this paper, a clustering technique based on soft set theory is proposed for categorical data through a multinomial distribution. The data will be represented as a multi soft set which is every soft set has its probability of being a member of the cluster. The data with the highest probability will be assigned as the member of the cluster. The experiment of the proposed technique is evaluated based on the Dunn index with regard to the number of clusters and response time. The experiment results show that the proposed technique has the lowest response time with high stability compared to baseline techniques. This study recommends a maximum number of clusters in implementation on the real data. 

%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=15420 %R doi:10.18517/ijaseit.11.5.15420 %J International Journal on Advanced Science, Engineering and Information Technology %V 11 %N 5 %@ 2088-5334

IEEE

Iwan Tri Riyadi Yanto,Rohmat Saedudin,Sely Novita Sari,Mustafa Mat Deris and Norhalina Senan,"Soft Set Multivariate Distribution for Categorical Data Clustering," International Journal on Advanced Science, Engineering and Information Technology, vol. 11, no. 5, pp. 1841-1846, 2021. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.11.5.15420.

RefMan/ProCite (RIS)

TY  - JOUR
AU  - Yanto, Iwan Tri Riyadi
AU  - Saedudin, Rohmat
AU  - Sari, Sely Novita
AU  - Deris, Mustafa Mat
AU  - Senan, Norhalina
PY  - 2021
TI  - Soft Set Multivariate Distribution for Categorical Data Clustering
JF  - International Journal on Advanced Science, Engineering and Information Technology; Vol. 11 (2021) No. 5
Y2  - 2021
SP  - 1841
EP  - 1846
SN  - 2088-5334
PB  - INSIGHT - Indonesian Society for Knowledge and Human Development
KW  - Clustering; categorical data; soft set; multivariate.
N2  - 

Clustering is the process of breaking down a huge dataset into smaller groups. It has been used in some field studies including pattern recognition, segmentation, and statistics with remarkable success. Clustering is a technique for dividing multivariate datasets into groups. No inherent distance measure on data category makes clustering data more challenging than numerical data. Data category can be assumed following the data from a multinomial distribution. Thus, the standard model parametric model can be used in latent class clustering based on the independent product of multinomial distributions. Meanwhile, multi-valued attributes on the categorical data can be decomposed into the standard set on a multi soft set. In this paper, a clustering technique based on soft set theory is proposed for categorical data through a multinomial distribution. The data will be represented as a multi soft set which is every soft set has its probability of being a member of the cluster. The data with the highest probability will be assigned as the member of the cluster. The experiment of the proposed technique is evaluated based on the Dunn index with regard to the number of clusters and response time. The experiment results show that the proposed technique has the lowest response time with high stability compared to baseline techniques. This study recommends a maximum number of clusters in implementation on the real data. 

UR - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=15420 DO - 10.18517/ijaseit.11.5.15420

RefWorks

RT Journal Article
ID 15420
A1 Yanto, Iwan Tri Riyadi
A1 Saedudin, Rohmat
A1 Sari, Sely Novita
A1 Deris, Mustafa Mat
A1 Senan, Norhalina
T1 Soft Set Multivariate Distribution for Categorical Data Clustering
JF International Journal on Advanced Science, Engineering and Information Technology
VO 11
IS 5
YR 2021
SP 1841
OP 1846
SN 2088-5334
PB INSIGHT - Indonesian Society for Knowledge and Human Development
K1 Clustering; categorical data; soft set; multivariate.
AB 

Clustering is the process of breaking down a huge dataset into smaller groups. It has been used in some field studies including pattern recognition, segmentation, and statistics with remarkable success. Clustering is a technique for dividing multivariate datasets into groups. No inherent distance measure on data category makes clustering data more challenging than numerical data. Data category can be assumed following the data from a multinomial distribution. Thus, the standard model parametric model can be used in latent class clustering based on the independent product of multinomial distributions. Meanwhile, multi-valued attributes on the categorical data can be decomposed into the standard set on a multi soft set. In this paper, a clustering technique based on soft set theory is proposed for categorical data through a multinomial distribution. The data will be represented as a multi soft set which is every soft set has its probability of being a member of the cluster. The data with the highest probability will be assigned as the member of the cluster. The experiment of the proposed technique is evaluated based on the Dunn index with regard to the number of clusters and response time. The experiment results show that the proposed technique has the lowest response time with high stability compared to baseline techniques. This study recommends a maximum number of clusters in implementation on the real data. 

LK http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=15420 DO - 10.18517/ijaseit.11.5.15420