Application of Adaptive Synthetic Nominal and Extreme Gradient Boosting Methods in Determining Factors Affecting Obesity: A Case Study of Indonesian Basic Health Research Survey 2013
Aplikasi Metode Adaptive Synthetic Nominal dan Extreme Gradient Boosting dalam Menentukan Faktor yang Memengaruhi Obesitas: Studi Kasus Riset Kesehatan Dasar Indonesia 2013
DOI:
https://doi.org/10.29244/ijsa.v6i2p309-317Keywords:
ADASYN-N, feature important, information gain, obesity, XGBootsAbstract
Obesity is the accumulation of excessive body fat and can be harmful to health. According to recent studies, several factors that contribute to the increasing prevalence of obesity in Indonesia include poor diet, lack of consumption of vegetables and fruits, high consumption of fast food, area of residence, and lack of physical activity. In addition, psychological factors, high consumption of alcohol and cigarettes, cultural differences, and stress factors also trigger obesity. The rapid development of the medical field cannot be separated from the availability of data that is increasingly easy to access and increasing knowledge in the medical field. This makes machine learning increasingly needed for pattern recognition from very large medical data, including obesity data. In this study, the factors that influence obesity status in Indonesia will be determined. In order to achieve this, Extreme Gradient Boosting (XGBoost) was used. This method is one of the classification methods that has better scalability and more efficient over its previous methods. Besides that, to overcome the imbalanced data, Adaptive Synthetic Nominal Algorithm (ADASYN-N) is used in order to balance the data and improve its prediction accuracy. Both the ADASYN-N and XGBoost methods will be applied to obesity data from the Indonesian Basic Health Research Survey in 2013. This study shows that female is more at risk in determining obesity status in Indonesia based on the highest gain value (37%). In addition, age 35-54 years, strenuous activity, and eating vegetables for 6 days are also risk factors of obesity.
Downloads
References
Alkhalaf, M., Yu, P., Shen, J., & Deng, C. (2022). A review of the application of machine learning in adult obesity studies. Applied Computing and Intelligence, 2(1): 32–48. https://doi.org/10.3934/aci.2022002
Charbuty, B., & Abdulazeez, A. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends, 2(01): 20–28. https://doi.org/10.38094/jastt20165
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16: 321–357. https://doi.org/10.1613/jair.953
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
Fithriasari, K., Hariastuti, I., & Wening, K. S. (2020). Handling Imbalance Data in Classification Model with Nominal Predictors. International Journal of Computing Science and Applied Mathematics, 6(1): 33. https://doi.org/10.12962/j24775401.v6i1.6643
Haibo He, Yang Bai, Garcia, E. A., & Shutao Li. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Jukic, S., Saracevic, M., Subasi, A., & Kevric, J. (2020). Comparison of Ensemble Machine Learning Methods for Automated Classification of Focal and Non-Focal Epileptic EEG Signals. Mathematics, 8(9): 1481. https://doi.org/10.3390/math8091481
Morgenstern, J. D., Rosella, L. C., Costa, A. P., de Souza, R. J., & Anderson, L. N. (2021). Perspective: Big Data and Machine Learning Could Help Advance Nutritional Epidemiology. Advances in Nutrition, 12(3): 621–631. https://doi.org/10.1093/advances/nmaa183
Oddo, V. M., Maehara, M., & Rah, J. H. (2019). Overweight in Indonesia: an observational study of trends and risk factors among adults and children. BMJ Open, 9(9): e031198. https://doi.org/10.1136/bmjopen-2019-031198
Rahayu, S., Adji, T. B., & Setiawan, N. A. (2017). Analisis Perbandingan Metode Over-Sampling Adaptive Synthetic-Nominal (ADASYN-N) dan Adaptive Synthetic-kNN (ADSYN-kNN) untuk Data dengan Fitur Nominal-Multi Categories. 5.
Sari, K., & Rosha, B. Ch. (2016). Several dominants risk factors related to obesity in urban childbearing age women in Indonesia. Health Science Journal of Indonesia, 6(1Jun): 63–68. https://doi.org/10.22435/hsji.v6i1Jun.4494
Song, Y., & Lu, Y. (2015). Decision tree methods: applications for classification and prediction. 27(2): 7. http://dx.doi.org/10.11919/j.issn.1002-0829.215044
Thamrin, S. A., Arsyad, D. S., Kuswanto, H., Lawi, A., & Nasir, S. (2021). Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018. Frontiers in Nutrition, 8: 669155. https://doi.org/10.3389/fnut.2021.669155
Zhou, Z.-H. (2012). Ensemble Methods: Foundations and Algorithms (0 ed.). https://doi.org/10.1201/b12207