Data reconstruction of two actuarial metrics by staking machine learning models

Amaury de Souza Amaral; Jardel Marques Monti; Segundo Parra Milián

doi:10.59681/2175-4411.v15.iEspecial.2023.1095

Autores

Amaury de Souza Amaral Pontifícia Universidade Católica de São Paulo
Jardel Marques Monti Pontifícia Universidade Católica de São Paulo
Segundo Parra Milián Universidade Estadual Paulista

DOI:

https://doi.org/10.59681/2175-4411.v15.iEspecial.2023.1095

Palavras-chave:

Artificial Intelligence, Process Optimization, Supplementary Health

Resumo

Objective: A large part of Brazilian’s health care is financed by health insurance plans, which readjustments have been questioned in the courts. The data from court cases tends to not be readily available. Therefore, in order to reconstruct the data, we developed a metric using Deep Learning techniques to obtain data estimations. Method: After analyzing the data obtained from the Regulatory Agency, we trained three different supervised learning algorithms aiming to obtain information through an optimization problem. We used the Augmented Lagrangian method aiming to include the constraints into the cost function and Simulated Annealing to minimize it. Results: Consistent as expected, the stacking performance outperformed the base learners. Conclusions: With the results obtained it was possible to obtain the retroactive average cost per claim and frequency information, fetched from the "health plan's past".

Biografia Autor

Segundo Parra Milián, Universidade Estadual Paulista

Instituto de Física Teórica – IFT - Universidade Estadual Paulista – São Paulo

Referências

Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785-794.

Cortes C, Vapnik V. Support-Vector networks. Machine Learning. 1995; 20 (3): 273– 297.

Ganaie M, Hu M, et al. Ensemble deep learning: A review. arXiv preprint arXiv:2104.02395. 2021

Wolpert D H. Stacked generalization. Neural Networks. 1992; 5 (2): 241–259.

Breiman L. Stacked regressions. Mach. Learn. 1996; 24 (1): 49–64.

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. 2016.

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, Del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE. Array programming with NumPy. Nature. 2020 Sep;585(7825):357-362

Guo C, Berkhahn F. Entity embeddings of categorical variables. ArXiv, abs/1604.06737. 2016.

Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.

Hestenes MR. Multiplier and gradient methods. Journal of optimization theory and applications. 1969; 4 (5): 303–320.

Powell MJ. A method for nonlinear constraints in minimization problems. Opti mization. 1969: 283–298.

Bohachevsky IO, Johnson ME, Stein M L. Generalized simulated annealing for function optimization. Technometrics.1986; 28 (3): 209–217.

Romeo F, Sangiovanni-Vincentelli A. A theoretical framework for simulated annealing. Algorithmica. 1991; 6 (1): 302–345.

Data reconstruction of two actuarial metrics by staking machine learning models

Autores

DOI:

Palavras-chave:

Resumo

Biografia Autor

Segundo Parra Milián, Universidade Estadual Paulista

Referências

Downloads

Publicado

Como Citar

Edição

Secção

Licença

Artigos Similares

Idioma

Informações

Indexadores, Bases de Dados, Repositórios e Bibliotecas