AN APPLICATION OF GENETIC ALGORITHM FOR CLUSTERING OBSERVATIONS WITH INCOMPLETE DATA

Authors

  • Frisca Rizki Ananda PT. Manulife, Indonesia
  • Asep Saefuddin Department of Statistics, Bogor Agricultural University (IPB)
  • Bagus Sartono Department of Statistics, Bogor Agricultural University (IPB)

DOI:

https://doi.org/10.29244/ijsa.v1i1.48

Abstract

Cluster analysis is a method to classify observations into several clusters. A common strategy for clustering the observations uses distance as a similarity index. However distance approach cannot be applied when data is not complete. Genetic Algorithm is applied by involving variance (GACV) in order to solve this problem. This study employed GACV on Iris data that was introduced by Sir Ronald Fisher. Clustering the incomplete data was implemented on data which was produced by deleting some values of Iris data. The algorithm was developed under R 3.0.2 software and got satisfying result for clustering complete data with 95.99% sensitivity and 98% consistency. GACV could be applied to cluster observations with missing value without filling in the missing value or excluding these observations. Performance on clustering incomplete observations is also satisfying but tends to decrease as the proportion of incomplete values increases. The proportion of incomplete values should be less than or equal to 40% to get sensitivity and consistency not less than 90.

Keywords: Cluster Analysis, Genetic Algorithm, Incomplete Data.

Downloads

Download data is not yet available.

Published

2017-10-31

How to Cite

Ananda, F. R., Saefuddin, A., & Sartono, B. (2017). AN APPLICATION OF GENETIC ALGORITHM FOR CLUSTERING OBSERVATIONS WITH INCOMPLETE DATA. Indonesian Journal of Statistics and Its Applications, 1(1), 13–23. https://doi.org/10.29244/ijsa.v1i1.48

Issue

Section

Articles

Most read articles by the same author(s)