Data reconstruction of two actuarial metrics by staking machine learning models

Authors

  • Amaury de Souza Amaral Pontifícia Universidade Católica de São Paulo
  • Jardel Marques Monti Pontifícia Universidade Católica de São Paulo
  • Segundo Parra Milián Universidade Estadual Paulista

DOI:

https://doi.org/10.59681/2175-4411.v15.iEspecial.2023.1095

Keywords:

Artificial Intelligence, Process Optimization, Supplementary Health

Abstract

Objective: A large part of Brazilian’s health care is financed by health insurance plans, which readjustments have been questioned in the courts. The data from court cases tends to not be readily available. Therefore, in order to reconstruct the data, we developed a metric using Deep Learning techniques to obtain data estimations. Method: After analyzing the data obtained from the Regulatory Agency, we trained three different supervised learning algorithms aiming to obtain information through an optimization problem. We used the Augmented Lagrangian method aiming to include the constraints into the cost function and Simulated Annealing to minimize it. Results: Consistent as expected, the stacking performance outperformed the base learners. Conclusions: With the results obtained it was possible to obtain the retroactive average cost per claim and frequency information, fetched from the "health plan's past".

Author Biography

Segundo Parra Milián, Universidade Estadual Paulista

Instituto de Física Teórica – IFT - Universidade Estadual Paulista – São Paulo

References

Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785-794.

Cortes C, Vapnik V. Support-Vector networks. Machine Learning. 1995; 20 (3): 273– 297.

Ganaie M, Hu M, et al. Ensemble deep learning: A review. arXiv preprint arXiv:2104.02395. 2021

Wolpert D H. Stacked generalization. Neural Networks. 1992; 5 (2): 241–259.

Breiman L. Stacked regressions. Mach. Learn. 1996; 24 (1): 49–64.

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. 2016.

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, Del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE. Array programming with NumPy. Nature. 2020 Sep;585(7825):357-362

Guo C, Berkhahn F. Entity embeddings of categorical variables. ArXiv, abs/1604.06737. 2016.

Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.

Hestenes MR. Multiplier and gradient methods. Journal of optimization theory and applications. 1969; 4 (5): 303–320.

Powell MJ. A method for nonlinear constraints in minimization problems. Opti mization. 1969: 283–298.

Bohachevsky IO, Johnson ME, Stein M L. Generalized simulated annealing for function optimization. Technometrics.1986; 28 (3): 209–217.

Romeo F, Sangiovanni-Vincentelli A. A theoretical framework for simulated annealing. Algorithmica. 1991; 6 (1): 302–345.

Published

2023-07-20

How to Cite

Amaral, A. de S., Monti, J. M., & Milián, S. P. (2023). Data reconstruction of two actuarial metrics by staking machine learning models. Journal of Health Informatics, 15(Especial). https://doi.org/10.59681/2175-4411.v15.iEspecial.2023.1095

Similar Articles

<< < 1 2 3 4 5 6 7 > >> 

You may also start an advanced similarity search for this article.