Cite Article

Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets

Choose citation format

BibTeX

@article{IJASEIT6005,
   author = {Joven A. Tolentino and Bobby D. Gerardo},
   title = {Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets},
   journal = {International Journal on Advanced Science, Engineering and Information Technology},
   volume = {9},
   number = {3},
   year = {2019},
   pages = {766--771},
   keywords = {fuzzy C-Means; high dimensional dataset; Manhattan distance; clustering.},
   abstract = {

The problem of mining a high dimensional data includes a high computational cost, a high dimensional dataset composed of thousands of attribute and or instances. The efficiency of an algorithm, specifically, its speed is oftentimes sacrificed when this kind of dataset is supplied to the algorithm. Fuzzy C-Means algorithm is one which suffers from this problem. This clustering algorithm requires high computational resources as it processes whether low or high dimensional data. Netflix data rating, small round blue cell tumors (SRBCTs) and Colon Cancer (52,308, and 2,000 of attributes and 1500, 83 and 62 of instances respectively) dataset were identified as a high dimensional dataset. As such, the Manhattan distance measure employing the trigonometric function was used to enhance the fuzzy c-means algorithm. Results show an increase on the efficiency of processing large amount of data using the Netflix ,Colon cancer and SRCBT an (39,296, 38,952 and 85,774 milliseconds to complete the different clusters, respectively) average of 54,674 milliseconds while Manhattan distance measure took an average of (36,858, 36,501 and 82,86 milliseconds, respectively)  52,703 milliseconds for the entire dataset to cluster. On the other hand, the enhanced Manhattan distance measure took (33,216, 32,368 and 81,125 milliseconds, respectively) 48,903 seconds on clustering the datasets. Given the said result, the enhanced Manhattan distance measure is 11% more efficient compared to Euclidean distance measure and 7% more efficient than the Manhattan distance measure respectively.

},    issn = {2088-5334},    publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development},    url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=6005},    doi = {10.18517/ijaseit.9.3.6005} }

EndNote

%A Tolentino, Joven A.
%A Gerardo, Bobby D.
%D 2019
%T Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets
%B 2019
%9 fuzzy C-Means; high dimensional dataset; Manhattan distance; clustering.
%! Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets
%K fuzzy C-Means; high dimensional dataset; Manhattan distance; clustering.
%X 

The problem of mining a high dimensional data includes a high computational cost, a high dimensional dataset composed of thousands of attribute and or instances. The efficiency of an algorithm, specifically, its speed is oftentimes sacrificed when this kind of dataset is supplied to the algorithm. Fuzzy C-Means algorithm is one which suffers from this problem. This clustering algorithm requires high computational resources as it processes whether low or high dimensional data. Netflix data rating, small round blue cell tumors (SRBCTs) and Colon Cancer (52,308, and 2,000 of attributes and 1500, 83 and 62 of instances respectively) dataset were identified as a high dimensional dataset. As such, the Manhattan distance measure employing the trigonometric function was used to enhance the fuzzy c-means algorithm. Results show an increase on the efficiency of processing large amount of data using the Netflix ,Colon cancer and SRCBT an (39,296, 38,952 and 85,774 milliseconds to complete the different clusters, respectively) average of 54,674 milliseconds while Manhattan distance measure took an average of (36,858, 36,501 and 82,86 milliseconds, respectively)  52,703 milliseconds for the entire dataset to cluster. On the other hand, the enhanced Manhattan distance measure took (33,216, 32,368 and 81,125 milliseconds, respectively) 48,903 seconds on clustering the datasets. Given the said result, the enhanced Manhattan distance measure is 11% more efficient compared to Euclidean distance measure and 7% more efficient than the Manhattan distance measure respectively.

%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=6005 %R doi:10.18517/ijaseit.9.3.6005 %J International Journal on Advanced Science, Engineering and Information Technology %V 9 %N 3 %@ 2088-5334

IEEE

Joven A. Tolentino and Bobby D. Gerardo,"Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets," International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 3, pp. 766-771, 2019. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.9.3.6005.

RefMan/ProCite (RIS)

TY  - JOUR
AU  - Tolentino, Joven A.
AU  - Gerardo, Bobby D.
PY  - 2019
TI  - Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets
JF  - International Journal on Advanced Science, Engineering and Information Technology; Vol. 9 (2019) No. 3
Y2  - 2019
SP  - 766
EP  - 771
SN  - 2088-5334
PB  - INSIGHT - Indonesian Society for Knowledge and Human Development
KW  - fuzzy C-Means; high dimensional dataset; Manhattan distance; clustering.
N2  - 

The problem of mining a high dimensional data includes a high computational cost, a high dimensional dataset composed of thousands of attribute and or instances. The efficiency of an algorithm, specifically, its speed is oftentimes sacrificed when this kind of dataset is supplied to the algorithm. Fuzzy C-Means algorithm is one which suffers from this problem. This clustering algorithm requires high computational resources as it processes whether low or high dimensional data. Netflix data rating, small round blue cell tumors (SRBCTs) and Colon Cancer (52,308, and 2,000 of attributes and 1500, 83 and 62 of instances respectively) dataset were identified as a high dimensional dataset. As such, the Manhattan distance measure employing the trigonometric function was used to enhance the fuzzy c-means algorithm. Results show an increase on the efficiency of processing large amount of data using the Netflix ,Colon cancer and SRCBT an (39,296, 38,952 and 85,774 milliseconds to complete the different clusters, respectively) average of 54,674 milliseconds while Manhattan distance measure took an average of (36,858, 36,501 and 82,86 milliseconds, respectively)  52,703 milliseconds for the entire dataset to cluster. On the other hand, the enhanced Manhattan distance measure took (33,216, 32,368 and 81,125 milliseconds, respectively) 48,903 seconds on clustering the datasets. Given the said result, the enhanced Manhattan distance measure is 11% more efficient compared to Euclidean distance measure and 7% more efficient than the Manhattan distance measure respectively.

UR - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=6005 DO - 10.18517/ijaseit.9.3.6005

RefWorks

RT Journal Article
ID 6005
A1 Tolentino, Joven A.
A1 Gerardo, Bobby D.
T1 Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets
JF International Journal on Advanced Science, Engineering and Information Technology
VO 9
IS 3
YR 2019
SP 766
OP 771
SN 2088-5334
PB INSIGHT - Indonesian Society for Knowledge and Human Development
K1 fuzzy C-Means; high dimensional dataset; Manhattan distance; clustering.
AB 

The problem of mining a high dimensional data includes a high computational cost, a high dimensional dataset composed of thousands of attribute and or instances. The efficiency of an algorithm, specifically, its speed is oftentimes sacrificed when this kind of dataset is supplied to the algorithm. Fuzzy C-Means algorithm is one which suffers from this problem. This clustering algorithm requires high computational resources as it processes whether low or high dimensional data. Netflix data rating, small round blue cell tumors (SRBCTs) and Colon Cancer (52,308, and 2,000 of attributes and 1500, 83 and 62 of instances respectively) dataset were identified as a high dimensional dataset. As such, the Manhattan distance measure employing the trigonometric function was used to enhance the fuzzy c-means algorithm. Results show an increase on the efficiency of processing large amount of data using the Netflix ,Colon cancer and SRCBT an (39,296, 38,952 and 85,774 milliseconds to complete the different clusters, respectively) average of 54,674 milliseconds while Manhattan distance measure took an average of (36,858, 36,501 and 82,86 milliseconds, respectively)  52,703 milliseconds for the entire dataset to cluster. On the other hand, the enhanced Manhattan distance measure took (33,216, 32,368 and 81,125 milliseconds, respectively) 48,903 seconds on clustering the datasets. Given the said result, the enhanced Manhattan distance measure is 11% more efficient compared to Euclidean distance measure and 7% more efficient than the Manhattan distance measure respectively.

LK http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=6005 DO - 10.18517/ijaseit.9.3.6005