International Journal on Advanced Science, Engineering and Information Technology, Vol. 12 (2022) No. 6, pages: 2237-2247, DOI:10.18517/ijaseit.12.6.15582

Analysing Kinship in Severe Acute Respiratory Syndrome Coronavirus 2 DNA Sequences Based on Hierarchical and K-Means Clustering Methods Using Multiple Encoding Vector

Evander Banjarnahor, Alhadi Bustamam, Titin Siswantining, Patuan Tampubolon


Based on the World Health Organization data obtained in mid-April 2021, Coronavirus or Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has already infected more than 134.9 million people worldwide. The virus attacks human breathing, which can cause lung infections and even death. More than 2.9 million people worldwide have died due to coronavirus infection. Meanwhile in Indonesia, more than 1.5 million people has been infected and 42.5 thousand people died because of this coronavirus. Based on this data, it is important to carry out a kinship analysis of the coronavirus to reduce its spread. Identification of the kinship of the COVID-19 virus and its spread can be done by forming a phylogenetic tree and clustering. This study uses the Multiple Encoding Vector method in analysing the sequences and Euclidean distance to determine the distance matrix. This research will then use the Hierarchical clustering method to determine the number of initial centroids, which will be used later by the K-Means clustering method kinship in the SARS-CoV-2 DNA sequence. This study took samples of DNA sequences of SARS-CoV-2 from several infected countries. From the simulation results, the ancestors of SARS-CoV-2 came from China. The results of the analysis also show that the closest ancestors of COVID-19 to Indonesia came from India. The SARS-CoV-2 DNA sequence also consisted of nine clusters, and the sixth cluster has the most number of members.


Sequence alignment; bioinformatics; clustering; DNA kinship; phylogenetic analysis.

Viewed: 366 times (since abstract online)

cite this paper     download