K-prototypes Algorithm for Clustering Schools Based on The Student Admission Data in IPB University
DOI:
https://doi.org/10.29244/ijsa.v5i2p228-242Keywords:
clustering, k-prototypes, student admissionAbstract
The new student admissions was regularly held every year by all grades of education, including in IPB University. Since 2013, IPB University has a track record of every school that has succeeded in sending their graduates, even until they successfully completed their education at IPB University. It was recorded that there were 5,345 schools that included in the data. It was necessary to making every school in the data into the clusters, so IPB could see which schools were classified as good or not good in terms of sending their graduates to continue their education at IPB based on the characteristics of the clusters. This study using the k-prototypes algorithm because it can be used on the data that consisting of categorical and numerical data (mixed type data). The k-prototypes algorithm could maintain the efficiency of the k-means algorithm in handling large data sizes, but eliminated the limitations of k-means. The results showed that the optimal number of clusters in this study were four clusters. The fourth cluster (421 school members) was the best cluster related to the student admission at IPB University. On the other hand, the third cluster (391 school members) was the worst cluster in this study.
Downloads
References
Anderberg, M. R. (1973). Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks (Vol. 19). New York (US): Academic press.
Bunkers, M. J., Miller, J. R., & DeGaetano, A. T. (1996). Definition of climate regions in the Northern Plains using an objective cluster modification technique. Journal of Climate, 9(1): 130-146.
Huang, Z. (1998). Clustering large data sets with mixed numeric and categorical values. In Proceedings of the 1st pacific-asia conference on knowledge discovery and data mining, (PAKDD) (pp. 21-34).
Kader, G. D., & Perry, M. (2007). Variability for categorical variables. Journal of Statistics Education, 15(2). DOI: 10.1080/10691898.2007.11889465.
Li, Y., Luo, C., & Chung, S. M. (2008). Text clustering with feature selection by using statistical data. IEEE Transactions on knowledge and Data Engineering, 20(5): 641-652.
Liao, T. W. (2005). Clustering of time series data—a survey. Pattern recognition, 38(11): 1857-1874.
Lin, H. J., Yang, F. W., & Kao, Y. T. (2005). An efficient GA-based clustering technique. Journal of Applied Science and Engineering, 8(2): 113-122.
Okada, T. (1999). Sum of squares decomposition for categorical data. Kwansei Gakuin Studies in Computer Science, 14(1): 1-6.
Rencher, A. C. (2007). Methods of multivariate analysis [second edition]. John Wiley & Sons.