Main Article Content
Abstract
Asuransi adalah kebijakan yang menghilangkan atau mengurangi biaya kerugian yang terjadi oleh berbagai risiko. Berbagai faktor mempengaruhi biaya asuransi. Pertimbangan-pertimbangan ini berkontribusi pada formulasi kebijakan asuransi. Machine learning (ML) untuk sektor industri asuransi dapat membuat perumusan polis asuransi menjadi lebih efisien. Studi ini menunjukkan bagaimana model regresi yang berbeda dapat meramalkan biaya asuransi. Dan penelitian ini membandingkan hasil model misalnya Neural Network, Gradient Boosting, Random Forest, k-Nearest Neighbors, Decision tree, Multiple Linear Regression, dan Support Vector Machine. Paper ini menawarkan pendekatan terbaik pada model Gradient Boosting dengan nilai RMSE sebesar 4527.749, nilai MAE sebesar 2460.358, nilai MSE sebesar 20500507.210865 dan nilai R2 sebesar 0.858
Article Details
References
- [1] Gupta, S., & Tripathi, P. (2016, February). An emerging trend of big data analytics with health insurance in India. In 2016 International Conference on Innovation and Challenges in Cyber Security (ICICCS-INBUSH) (pp. 64-69). IEEE.
- [2] Kaggle Medical Cost Personal Datasets. Kaggle Inc. https://www.kaggle.com/mirichoi0218/insurance.
- [3] Pesantez-Narvaez, J., Guillen, M., & Alcañiz, M. (2019). Predicting motor insurance claims using telematics data—XGBoost versus logistic regression. Risks, 7(2), 70
- [4] Singh, R., Ayyar, M. P., Pavan, T. S., Gosain, S., & Shah, R. R. (2019, September). Automating Car Insurance Claims Using Deep Learning Techniques. In 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM) (pp. 199-207). IEEE.
- [5] Stucki, O. (2019). Predicting the customer churn with machine learning methods: case: private insurance customer data.
- [6] Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., ... & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj, 338.
- [7] Van Buuren, S. (2018). Flexible imputation of missing data. CRC press.
- [8] Fauzan, M. A., & Murfi, H. (2018). The accuracy of XGBoost for insurance claim prediction. Int. J. Adv. Soft Comput. Appl, 10(2).
- [9] Kowshalya, G., & Nandhini, M. (2018, April). Predicting fraudulent claims in automobile insurance. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT) (pp. 1338-1343). IEEE.
- [10] Kayri, M., Kayri, I., & Gencoglu, M. T. (2017, June). The performance comparison of multiple linear regression, random forest and artificial neural network by using photovoltaic and atmospheric data. In 2017 14th International Conference on Engineering of Modern Electric Systems (EMES) (pp. 1-4). IEEE.
- [11] Denuit, Michel & Hainaut, Donatien & Trufin, Julien. (2019). Effective Statistical Learning Methods for Actuaries I: GLMs and Extensions. 10.1007/978-3-030-25820-7.
- [12] Breiman, Leo. 2001. ―Random Forests.‖ Machine Learning 45 (1). Springer: 5–32.
- [13] Chen, T., & Guestrin, C. (2016). XGBoost: a scalable tree boosting system 22nd ACM SIGKDD Int. In Conf. on Knowledge Discovery and Data Mining.
- [14] Aler, R., Galván, I.M., Ruiz-Arias, J.A., Gueymard, C.A. (2017). Improving the separation of direct and diffuse solar radiation components using machine learning by gradient boosting. In Solar Energy vol. 150, pp. 558-569.
- [15] Volkovs, M., Yu, G. W., & Poutanen, T. (2017). Content-based neighbor models for cold start in recommender systems. In Proceedings of the Recommender Systems Challenge 2017 (pp. 1-6).
- [16] Cunningham, Padraig, and Sarah Jane Delany. 2007. ―K-Nearest Neighbour Classifiers.‖ Multiple Classifier Systems 34 (8). Springer New York, NY, USA: 1–17
- [17] Jiang, Shengyi, Guansong Pang, Meiling Wu, and Limin Kuang. 2012.
- ―An Improved K-Nearest-Neighbor Algorithm for Text Categorization.‖ Expert Systems with Applications 39 (1). Elsevier: 1503 9.
- [18] Mccord, Michael, and M Chuah. 2011. ―Spam Detection on Twitter Using Traditional Classifiers.‖ In International Conference on Autonomic and Trusted Computing, 175–86. Springer.
- [19] Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140
- [20] Breiman, Leo, and others. 2001. ―Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).‖ Statistical Science 16 (3). Institute of Mathematical Statistics: 199–231.
- [21] Friedman. 2002. ―Stochastic Gradient Boosting.‖ Computational Statistics & Data Analysis 38 (4). Elsevier: 367–78.
- [22] Sabbeh, S. F. (2018). Machine-learning techniques for customer retention: A comparative study. International Journal of Advanced Computer Science and Applications, 9(2).
- [23] Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. MIT press.
- [24] Song, Y. Y., & Ying, L. U. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2), 130.
- [25] Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1, No. 2). Cambridge: MIT press.
- [26] Kansara, Dhvani & Singh, Rashika & Sanghvi, Deep & Kanani, Pratik. (2018). Improving Accuracy of Real Estate Valuation Using Stacked Regression. Int. J. Eng. Dev. Res. (IJEDR) 6(3), 571–577 (2018)
- [27] Yerpude, P., Gudur, V.: Predictive modelling of crime dataset using data mining. Int. J. Data Min. Knowl. Manag. Process (IJDKP) 7(4) (2017)
- [28] Grosan, C., Abraham, A.: Intelligent Systems: A Modern Approach, Intelligent Systems Reference Library Series. Springer, Cham (2011)