Using Eligibility Traces Algorithm to Specify the Optimal Dosage for the Purpose of Cancer Cell Population Control in Melanoma Patients with a Consideration of the Side Effects

Document Type : Persian Original Article

Authors

1 Faculty of Electrical and Biomedical Engineering, Sadjad University , Mashhad, Iran.

2 Faculty of Electrical and Biomedical Engineering, Sadjad University , Mashhad, Iran

3 Department of Dermatology, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran

4 Faculty of Electrical and Robotics Engineering, Shahrood University of Technology

Abstract

This paper mainly aims to determine the optimal drug dosage for the purpose of reducing the population of cancer cells in melanoma patients. To do so, Reinforcement Learning method and the eligibility traces algorithm are employed, giving us the advantage of creating a compromise between the two algorithms of the reinforcement learning, being Monte-Carlo and Temporal Difference. Furthermore, it can be said that using this approach, there was no need to employ a mathematical model in the whole process. However, as its implementation on the real system was not possible, a delayed nonlinear mathematical model is used to investigate the performance of the proposed controller and simulate the behavior of the environment. It should be noted this mathematical model made use of no control method. This is the first time that population control of cancer cells is applied and tested on this model. To know of the optimal dosage of the drug, it should be mentioned that the drug is required to prevent the side effects on healthy/normal cells as much as possible. According to the obtained results, the eligibility traces algorithm is able to control and reduce the population of cancer cells through injecting the sub-optimal drug dose. This will increase the level of immunity in our body. Finally, to demonstrate the advantage of a selective method of increasing the rate of cancer cell death, this method is compared with the Q-learning algorithm and optimal control. By applying the fault to the sensor, the performance of the proposed controller to reduce cancer cells was investigated. The adaptability of the proposed method with the environment changes is checked afterwards. To this end, uncertainty in the system parameters and initial conditions are applied and the population of cancer cells are controlled in five melanoma patients. Moreover, having added noise to the system, it was shown that the eligibility traces algorithm is able to control the population of cancer cells and make it reach zero. Additionally, the convergence speed of both eligibility traces algorithm and Q learning algorithm in reducing the number of cancer cells for different learning rates was investigated.

Keywords


  [1]     M. Suryapraba, G. Rajanarayanee and P. Kumari, “Analysis of Skin Cancer Classification Using GLCM Based On Feature Extraction in Artificial Neural Network,” International Journal of Emerging Technology in Computer Science & Electronics, Vol. 13, 2015.
  [2]     S. Mazdeyasna, A. H. Jafari, J. Hadjati, A. Allahverdy, and M.Alavi-Moghaddam. “Modeling the Effect of Chemotherapy on Melanoma B16F10 in Mice Using Cellular Automata and Genetic Algorithm in Tapered Dosage of FBS and Cisplatin.” Frontiers in Biomedical Technologies 2.2, Vol. 2, No. 2, pp. 103-108, 2015.
  [3]     جواد بهار آرا, زهرا طیرانی نجاران, الهه امینی, فرزانه سالک عبداللهی. "اثر مهاری کروسین بر ملانوژنز در سلول‌های رده ملانومای موشی "B16F10. ماهنامه علمی پ‍ژوهشی دانشگاه علوم پزشکی شهید صدوقی یزد, دوره 24، شماره 6، 479-490، شهریور 1395.
  [4]     X. Wang, S. Lu and J. Guo. “Treatment algorithm of metastatic mucosal melanoma.” Chinese clinical oncology 3.3, Vol. 3, No. 3, 2014.
  [5]     Y. Zheng and Y. Jiang, "mTOR inhibitors at a glance", Molecular and cellular pharmacology, Vol. 7, No. 2, 2015.
  [6]     U. Sirin, F. Polat and R. Alhajj. “Employing batch reinforcement learning to control gene regulation without explicitly constructing gene regulatory networks.” Proceedings of the 23rd International Joint Conference on Artificial Intelligence, 2013.
  [7]     الناز کلهر، امین نوری."کنترل سلول‌های سرطانی در بیماران مبتلا به ملانوما با استفاده از الگوریتم ژنتیک و لحاظ اثرات زیان‌بار دارو"، بیست و چهارمین کنفرانس ملی و دومین کنفرانس بین‌المللی مهندسی زیست پزشکی ایران، 10-8 آذر 1396.
  [8]     R. Sutton and A. Barto. Reinforcement learning: An introduction, MIT Press, 2011.
  [9]     A. Noori and M. A. Sadrnia. “Glucose level control using Temporal Difference methods.” In Electrical Engineering (ICEE), 2017 Iranian Conference on, pp. 895-900, 2017.
[10]     M. De Paula, L. O. Ávila and E. C. Martínez. “Controlling blood glucose variability under uncertainty using reinforcement learning and Gaussian processes.” Applied Soft Computing 35, Vol. 35, pp. 310-332, 2015.
[11]      G. Czibula, I. M. Bocicor and I. Czibula. “Temporal ordering of cancer microarray data through a reinforcement learning based approach.” PloS one 8, Vol. 8, No. 4, 2013.
[12]     M. Jacobs. “Personalized Anticoagulant Management Using Reinforcement Learning.” Ph. D. dissertation, Dep. of Bioengineering, University of Louisville, 2014. ‎
[13]     Padmanabhan, R., Meskin, N., & Haddad, W. M. “Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment.” Mathematical biosciences 293, Vol. 293, pp. 11-20, 2017.
[14]     A. Noori, M. B. Naghibi Sistani and N. Pariz. “Hepatitis B virus infection control using reinforcement learning”, ICEEE 2011. ‎
[15]     B. K. Petersen, J. Yang, W. S. Grathwohl, C.Cockrell, C. Santiago, G. An and D. M.Faissol. “Precision medicine as a control problem: Using simulation and deep reinforcement learning to discover adaptive, personalized multi-cytokine therapy for sepsis:. arXiv preprint arXiv:1802.10440.‏ 2018.
[16]     L. Göllmann and H. Maurer.” Optimal control problems with time delays: Two case studies in biomedicine”. Mathematical Biosciences & Engineering, Vol. 15, No. 5, pp. 1137-1154, 2018.
[17]     J. Malinzi, R. Ouifki, A. Eladdadi, D. F. Torres and K. A. White. “Enhancement of chemotherapy using oncolytic virotherapy: Mathematical and optimal control analysis”. arXiv preprint arXiv:1807.04329, 2018.
[18]     H. Moore. “How to mathematically optimize drug regimens using optimal control”. Journal of pharmacokinetics and pharmacodynamics, Vol. 45. No.1, pp. 127-137, 2018.
[19]     A. M. A. Rocha, M. F. P. Costa and E. M. Fernandes. “On a multiobjective optimal control of a tumor growth model with immune response and drug therapies”. International Transactions in Operational Research,Vol. 25, No. 1, pp. 269-294, 2018.
[20]     H. Khaloozadeh, P. Yazdanbakhsh and F. Homaei-Shandiz. “The Optimal Dose of Drug in Neoadjuvant Chemotherapy before Surgery for the Patients Suffering from Breast Cancer Stage III”. Iranian Journal of Biomedical Engineering,Vol. 1, No.4, pp. 319-334, 2008.
[21]     S. Eikenberry, T. Craig and K. Yang. “Tumor-immune interaction, surgical treatment, and cancer recurrence in a mathematical model of melanoma.”PLoSComputBiol, Vol. 5, No. 4, 2009.
[22]     Y. Kogan, A. Zvia and E. Moran. “A mathematical model for the immunotherapeutic control of the Th1/Th2 imbalance in melanoma.” Discrete and Continuous Dynamical Systems Series B, Vol. 18, No.4, pp. 1017-1030, 2013.
[23]      L. G. DePillisZ and A. Radunskaya. “A model of dendritic cell therapy for melanoma.”Frontiers in oncology, Vol. 3, 2013.
[24]     X. Sun, J.Bao and Y. Shao. “Mathematical modeling of therapy-induced cancer drug resistance: connecting cancer mechanisms to population survival rates.” Scientific reports 6, Vol. 6, No. 22498, 2016.
[25]     A. Kłusek, W. Dzwinel and V. Vasilyev. “Supermodeling in simulation of melanoma progression.” Procedia Computer Science, Vol. 80, pp. 999-1010, 2016.
[26]     A. Isabel. “On the geometric modulation of skin lesion growth: a mathematical model for melanoma.” Research on Biomedical Engineering AHEAD, Vol. 32, No. 1, pp. 2446-4740, 2016.
[27]     M. Pennisi. “A mathematical model of immune-system-melanoma competition.” Computational and mathematical methods in medicine, Vol. 2012, No. 850754, 2012.
[28]     H. Gholizade-Narm and A. Noori. “Control the population of free viruses in nonlinear uncertain HIV system using Q-learning.” International Journal of Machine Learning and Cybernetics, Vol. 9, No. 7, pp. 1169-1179, 2017.
[29]     N. A. Alias, Linear Quadratic Regulator (LQR) controller design for Inverted Pendulum, Ph. D. dissertation, Universiti Tun Hussein Onn Malaysia, 2013.
[30]     F. Farivar, M. N. Ahmadabadi. "Continuous reinforcement learning to robust fault tolerant control for a class of unknown nonlinear systems." Applied Soft Computing. Vol. 37, pp. 702-714, 2015.
[31]     M. Jin and J. Lavaei, "Stability-certified reinforcement learning: A control-theoretic perspective", arXiv preprint arXiv:1810.11505, 2018.
[32]     C. Tessler, Y. Efroni, Y. and S. Mannor, "Action Robust Reinforcement Learning and Applications in Continuous Control", arXiv preprint arXiv:1901.09184, 2019.
[33]     F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, "Safe model-based reinforcement learning with stability guarantees". In Advances in neural information processing systems, pp. 908-918, 2017.
[34]     J. Morimoto and K. Doya, "Robust reinforcement learning". Neural computation, Vol. 17, No. 2, pp. 335-359, 2005.
[35]     C.Szepesvári, "The asymptotic convergence-rate of Q-learning". In Advances in Neural Information Processing Systems. pp. 1064-1070, 1998.
[36]     B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu and L. Song, "SBEED: Convergent reinforcement learning with nonlinear function approximation", arXiv preprint arXiv:1712.10285. 2017.
[37]     A. Geramifard, M. Bowling, M. Zinkevich and R. S. Sutton, "iLSTD: Eligibility traces and convergence analysis", In Advances in Neural Information Processing Systems, pp. 441-448, 2007.
H. Yu, "On convergence of emphatic temporal-difference learning", In Conference on Learning Theory, pp. 1724-1751, 2015.