The outcome in patients with brain stroke: A deep learning neural network modeling
Nasrin Someeh1, Mohammad Asghari Jafarabadi1, Seyed Morteza Shamshirgaran2, Farshid Farzipoor1
1 Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
2 Department of Statistics and Epidemiology, Faculty of Health Sciences, Neyshabur University of Medical Sciences, Neyshabur, Iran
|Date of Submission||14-Mar-2020|
|Date of Decision||11-Apr-2020|
|Date of Acceptance||25-Apr-2020|
|Date of Web Publication||24-Aug-2020|
Prof. Mohammad Asghari Jafarabadi
Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz
Source of Support: None, Conflict of Interest: None
Background: The artificial intelligence field is obtaining ever-increasing interests for enhancing the accuracy of diagnosis and the quality of patient care. Deep learning neural network (DLNN) approach was considered in patients with brain stroke (BS) to predict and classify the outcome by the risk factors. Materials and Methods: A total of 332 patients with BS (mean age: 77.4 [standard deviation: 10.4] years, 50.6% – male) from Imam Khomeini Hospital, Ardabil, Iran, during 2008–2018 participated in this prospective study. Data were gathered from the available documents of the BS registry. Furthermore, the diagnosis of BS was considered based on computerized tomography scans and magnetic resonance imaging. The DLNN strategy was applied to predict the effects of the main risk factors on mortality. The quality of the model was measured by diagnostic indices. Results: The finding of this study for 81 selected models demonstrated that ranges of accuracy, sensitivity, and specificity are 90.5%–99.7%, 83.8%–100%, and 89.8%–99.5%, respectively. Based on the optimal model (tangent hyperbolic activation function with the minimum–maximum hidden units of 10–20, max epochs of 400, momentum of 0.5, and learning rate of 0.1), the most important predictors for BS mortality were time interval after 10 years (accuracy = 92.2%), age category (75.6%), the history of hyperlipoproteinemia (66.9%), and education level (66.9%). The other independent variables are at moderate importance (66.6%) which include sex, employment status, residential place, smoking habits, history of heart disease, cerebrovascular accident type, blood pressure, diabetes, oral contraceptive pill use, and physical activity. Conclusion: The best means for dropping the BS load is effective BS prevention. DLNN strategy showed a surprising presentation in the prediction of BS mortality based on the main risk factors with an excellent diagnostic accuracy. Moreover, the time interval after 10 years, age, the history of hyperlipoproteinemia, and education level are the most important predictors for BS.
Keywords: Brain stroke, data mining, deep learning, predicting, risk factors
|How to cite this article:|
Someeh N, Asghari Jafarabadi M, Shamshirgaran SM, Farzipoor F. The outcome in patients with brain stroke: A deep learning neural network modeling. J Res Med Sci 2020;25:78
|How to cite this URL:|
Someeh N, Asghari Jafarabadi M, Shamshirgaran SM, Farzipoor F. The outcome in patients with brain stroke: A deep learning neural network modeling. J Res Med Sci [serial online] 2020 [cited 2020 Oct 21];25:78. Available from: https://www.jmsjournal.net/text.asp?2020/25/1/78/293262
| Introduction|| |
Brain stroke (BS) is known as the main leading cause of death and permanent disability worldwide, and in Iran, it is the second leading reason for death and more than half of patients with BS lose their lives within 8 years. The threat of developing BS indicates to be doubled per decade. Based on the report by the WHO, nearly 15 million people suffer from BS worldwide every single year, and approximately 13 million BS result from high blood pressure. European countries demonstrate an average of 650,000 death caused by BS annually. These reports highlight the importance of the prognosis of the BS.
One of the principal public health concerns is modifiable factors among patients with BS. It is known that the most potent risk factors for BS are hypertension,,, history of hyperlipoproteinemia, and diabetes. Heart disease expands the risk of cardiovascular and cerebrovascular diseases (such as ischemic BS) too. Neurological weakness and death rates are significantly higher in these patients who also have diabetes. Furthermore, active and passive smoking are identified as the main risk factors for BS., Furthermore, passive smoking increases the risk of overall BS by 30%. Detecting BS risk factors may allow for rapid and conceivably more effective BS prevention.
To accelerate the identification and prevention period, and reduction of the BS load costs, this study aimed to employ the deep learning neural network (DLNN) method by applying the potent risk factors. DLNN method is a layered approach for processing information and making decisions. Utilizing more layers in the hidden part of the model than the classical NN methods, DLNN may provide more accuracy and precision. The DLNN is a predictive structure that can generate complicated functions as well as complex relationships among data. Flexibility and nonlinear nature are other main features of this tool.,, This research aims to develop a prediction-based DLNN model for the main risk factors of patients with BS by finding the optimal DLNN based on sensitivity, specificity, accuracy, and the area under the ROC curve (AUC).
| Materials and Methods|| |
Study design and procedure
In this prospective longitudinal study, data were collected from the BS registry of the Imam Khomeini Hospital, Ardabil, Iran. A total of 332 patients were entered in the 10-year follow-up of the study (2008–2018). All patients with BS were submitted by the International Coding System ICD-10 according to the computerized tomography (CT) scan and magnetic resonance imaging. The follow-up time was considered from the date of hospitalization by acute BS until the death or end of follow-up, whichever came first.
Main variables and measures
For all patients and based on hospital document, the demographic variables including age category at diagnosis (1: ≥58; 2: 59–68; 3: 69–75; ≤76), sex (1: male; 2: female), employment status (1: employed; 2: unemployed), place of residence (1: urban; 2: rural), education level (1: DIPLOMA; 2: academic), smoking (1: yes; 2: no), former smoking (1: yes; 2: no), waterpipe smoking (1: yes; 2: no), history of heart disease (1: yes; 2: no), diabetes (1: yes; 2: no), oral contraceptive pill use (1: yes; 2: no), physical activates (1: yes; 2: no), history of cerebrovascular accident type (1: ischemic; 2: hemorrhagic), history of blood pressure history (1: yes; 2: no), history of hyperlipoproteinemia (1: yes; 2: no), and history of myocardial infraction (1: yes; 2: no) were used in the analysis as input variables.
The protocol of the study was approved by the Institutional Review Board of Tabriz University of Medical Sciences (ethics code: IR.TBZMED.REC.1398.667). The privacy of participants was preserved, and all participants filled and signed the contentment and informed consent.
Statistical analysis was performed by STATISTICA (ver. 13) (StatSoft, Statistica, Tulsa, USA). Data were expressed using mean (standard deviation) and median (min–max) for normal and nonnormal numeric variables, respectively, and frequency (percent) for categorical variables. The DLNN model was applied to model the relationship between the event and independent variables. The basic DLNN model includes three parts: an input layer, hidden layers, and an output layer. The input layer consists of independent variables.
First, several settings for epoch, momentum, learning rate, and the size of hidden layers have been assessed (a total of 1533 different scenarios were implemented). Second, 81 models were selected, evaluated, checked, and compared precisely by diagnosis indices. Finally, one optimal model was chosen, and the input variables were entered into the model. The optimal DLNN model was presented by a radar plot utilizing Microsoft Excel (Microsoft Corporation Rosa, California, USA).
In hidden layers, activation functions were tangent hyperbolic (tanh), sigmoid, and rectified linear (rectilinear). The sample was split into three parts: 70% for training, 15% for testing, and 15% for validating utilizing a random sampling method. Diagnostic indices, including sensitivity, specificity, positive predictive value, negative predictive value accuracy, and the area under the receiver operating characteristic (ROC) curve, along with their 95% confidence interval (CI) were used to measure the quality and fitness of every model.
| Results|| |
From 480 enrolled patients, only 332 were eligible to participate in this study, and the censored data within 10 years of follow up were about 32 (13%) persons. The median follow-up time was 81.3 (min = 0.0, max = 163.3) months. About 26.7%, 23.3%, and 50% of the participants were aged under 58, ranged over 59–68, and above 69 years old, respectively. Furthermore, about 56% of the participants were female, 70% were unemployed, and 61% of them were urban inhabitants [Table 1]. Furthermore, 81% of the cases were not active smokers, and just 59% had a history of blood pressure, whereas 93% have no history of any myocardial infarctions [Table 2].
|Table 1: Demographic characteristics of the study participants and the results of log-rank test|
Click here to view
|Table 2: Clinical profile of study participants and the results of log-rank test|
Click here to view
Moreover, the results of the log-rank test showed that patients with a history of blood pressure and hyperlipoproteinemia had a higher risk of mortality. Besides, being male (P = 0.016), more senior ages (P < 0.001), oral contraceptive pill use (P < 0.001), ischemic cerebrovascular (P < 0.018), and no physical activity (P < 0.041) led into the higher risk of mortality [Table 1] and [Table 2].
Results of deep learning neural network-based modeling
The quality of 81 DLNNs, based on the diagnostic indices, encouragingly, demonstrated that the sensitivity ranged over 83.8% and 100%, while specificity varies from 89.8% to 99.5%. Further, the positive predictive value range started from 82.4% and terminated at 99%. For the negative predictive value, the range shifted from 92.1%–100%, and the accuracy scaled from 90.5% to 99.7%, respectively. [Table 3] offers 81 different settings of DLNN models based on the diagnostic indices. The optimal model with the highest accuracy was “81: Tanh. 10.400.5.1” where the properties of the model were as follows: minimum–maximum hidden units were 10–20, max epochs were 400, the momentum was 0.5, and the learning rate was 0.1. The accuracy of tanh, rectified linear, and sigmoid activation functions was estimated at 99.5, 99.3, and 94.3, respectively. The DLNN with a tanh activation function was considered as the optimal model.
|Table 3: Results of comparing 81 selected deep learning neural network models|
Click here to view
Based on the results from the optimal model, the effect of prediction of risk factors on mortality was divided into two categories. The radar plot [Figure 1] shows the accuracy by the most important predictors of BS. Accordingly, the most important predictors for BS mortality were time interval after 10 years with 92.2% accuracy, age category with 75.6% accuracy, the history of hyperlipoproteinemia with 66.9% accuracy, and education level with 66.9% accuracy. The other independent variables, as mentioned beforehand, were at a moderate importance level with 66.6% accuracy.
|Figure 1:Radar plot for comparing the variable importance based on the optimal model. Normalized importance of the independent variable to predict BS mortality. AGE = Age category; MI.HIS = History of myocardial infraction; JOB = Employment status; PLACE = Place of residence; EDU = Education level; CVA.HIS = History of cerebrovascular accident; W.P = Waterpipe smoking; F.SM = Former smoking; P.SM = Passive smoking; SM = Smoking; HEA.HIS = History of heart disease; OCPUSE = Oral contraceptive pill use; PA = Physical activates; BP.HIS = History of blood pressure; HLP.HIS = History of hyperlipoproteinemia; DI.HIS = Diabetes; CVA.T = Cerebrovascular accident type|
Click here to view
| Discussion|| |
For investigating the main predictors of survival in patients with BS, we used the DLNN technique which showed a surprising presentation in the prediction of BS mortality based on the main risk factors with an excellent diagnostic accuracy (99.7%, 100%, and 99.5% for accuracy, sensitivity, and specificity, respectively). It seems that DLNN offers the capability to analyze data more quickly and possibly with higher precision, besides its transformational features for the health care. The multilayered setting of deep learning empowers one to perform classification jobs such as identifying subtle abnormalities in medical imagining, clustering cases with similar characteristics, or highlighting associations between symptoms and results within data. In addition, the DLNN strategy does not require data prepossessing, and the system takes care of many self-filtering and self-normalization tasks., In order to discover and investigate the behavior of accuracy and error rate of models, the implementations were carried out in different settings and increasing the number of hidden layers, epochs, or learning rate improved the accuracy indices.
Similar to those carried out and found by our study, researchers are utilizing DLNN and various data mining techniques for the diagnosis of many illnesses such as heart diseases, diabetes, BS, and cancers. In another study that has utilized DLNN model for acute ischemic BS treatment, interestingly, results show the preeminence of the proposed DLNN versus a regression model. In another study, to predict final lesion volume, the DLNN performance was significantly better in predicting the outcome than the generalized linear model. In another research, three classification algorithms, including DLNN, were applied for predicting BS outcomes based on the demographic information of patients. The authors utilize the accuracy and the AUC as the indicators for evaluation. Therefore, DLNN technically accomplishes a vital rule in the prediction of diseases in the medical and health sciences.
The main objective of the study was modelling the most relevant risk factors for predicting BS by applying the DLNN approach. The results from the optimal model revealed that time interval, age category, history of hyperlipoproteinemia, and educational level were of the main predictors of death. Other studies have declared that these risk factors increase the rate of BS incidents.,, The finding of a study showed that among people aged over 75 years, the hemorrhagic incidence raised nearly 80%. On the contrary, men have higher death incidence than women (P < 0.001). Another research showed that moderate to heavy-intensity physical activities were associated with a lower risk of ischemic BS (adjusted hazard ratio: 0.65, 95% confidence interval: 0.44–0.98), while findings of the current study did not support it. These discrepancies may be the effect of dissimilarities of methodologies, target populations, and interferences.
Strengths and limitations
As a strength of this study, we used DLNN, aiming to derive rules and to detect complex relationships with higher accuracy, which is assumption independent as compared to classical statistical methods. In addition, compared to machine learning, DLNN can measure the accuracy of its answers on its own due to the nature of its multilayered structure.
However, the current study has some limitations. First, inaccurate responses in data collection that was not provided by patients themselves and was given by those who accompanied the patients. Second, the self reporting of the history of comorbidities. Third, overfitting concerns need to be mentioned; it indicates that the function's performance is strong in the training set but might be less appropriate in other datasets. Forth, the “black box” characteristics of the DLNN strategy are another restriction: while it can estimate any function, its mechanism reviewing may not present any sensible vision on the structure of the task being approximated. Therefore, it might be required to develop methods in practical application, comprehensively and precisely. In connection with the points previously mentioned, we recommend the DLNN with greater number of variables and larger sample sizes.
| Conclusion|| |
Interestingly, the DLNN strategy presented an amazing performance in the prediction of BS mortality based on the main risk factors with an admirable diagnostic accuracy of BS. Based on the results of the optimal model, the most important predictors for BS mortality were time interval after 10 years, smoking, history of myocardial infarction, and age category. The other independent variables were at a moderate importance level. A BS can be destructive to individuals and their society, so efficient BS prevention stays the best necessitates for dropping the BS burden, instead of considerable enhancements for the treatment of patients. By determining BS risk factors, early and conceivably more effective prevention will be possible.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Cheon S, Kim J, Lim J. The use of deep learning to predict stroke patient mortality. Int J Environ Res Public Health 2019;16:1-12.
Kim HC, Choi DP, Ahn SV, Nam CM, Suh I. Six-year survival and causes of death among stroke patients in Korea. Neuroepidemiology 2009;32:94-100.
Centers for Disease Control and Prevention. Available from: https://www.cdc.gov/
. [Last accessed on 2020 Apr 11].
Pastore D, Pacifici F, Capuani B, Palmirotta R, Dong C, Coppola A, et al
. Sex-genetic interaction in the risk for cerebrovascular disease. Curr Med Chem 2017;24:2687-99.
Bailey RR. Promoting physical activity and nutrition in people with stroke. Am J Occup Ther 2017;71:7105360010p1-5.
Assarzadegan F, Tabesh H, Shoghli A, Ghafoori Yazdi M, Tabesh H, Daneshpajooh P, et al
. Relation of stroke risk factors with specific stroke subtypes and territories. Iran J Public Health 2015;44:1387-94.
Hatleberg CI, Ryom L, Kamara D, de Wit S, Law M, Phillips A, et al
. Predictors of ischemic and hemorrhagic strokes among people living with HIV: The D: A: D international prospective multicohort study. EClinicalMedicine 2019;13:91-100.
Xu Z, Li Y, Tang S, Huang X, Chen T. Current use of oral contraceptives and the risk of first-ever ischemic stroke: A meta-analysis of observational studies. Thromb Res 2015;136:52-60.
Lee PN, Forey BA. Environmental tobacco smoke exposure and risk of stroke in nonsmokers: A review with meta-analysis. J Stroke Cerebrovasc Dis 2006;15:190-201.
Ghatak A. Deep learning with R. Singapore: Springer Singapore; 2019. p. 1-245.
Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 2013;35:1798-828.
Patel AB, Nguyen T, Baraniuk RG. A probabilistic framework for deep learning. In: Advances in Neural Information Processing Systems. NIPS'16: Proceedings of the 30th
International Conference on Neural Information Processing SystemsDecember 2016. P. 2558–66.
Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw 2015;61:85-117.
Das R, Turkoglu I, Sengur A. Effective diagnosis of heart disease through neural networks ensembles. Expert Syst Appl 2009;36:7675-80.
Iqbal K, Sohail Asghar DA. Hiding sensitive XML association rules with supervised learning technique. Intelligent Information Management. 2011;3:219-29.. [DOI: 10.4236/iim. 2011.36027].
Panzarasa S, Quaglini S, Sacchi L, Cavallini A, Micieli G, Stefanelli M. Data mining techniques for analyzing stroke care processes. Stud Health Technol Inform 2010;160:939-43.
Li L, Tang H, Wu Z, Gong J, Gruidl M, Zou J, et al
. Data mining techniques for cancer detection using serum proteomic profiling. Artif Intell Med 2004;32:71-83.
Stier N, Vincent N, Liebeskind D, Scalzo F. deep learning of tissue fate features in acute ischemic stroke. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2015;2015:1316-21.
Nielsen A, Hansen MB, Tietze A, Mouridsen K. Prediction of tissue outcome and assessment of treatment effect in acute ischemic stroke using deep learning. Stroke 2018;49:1394-401.
Kansadub T, Thammaboosadee T, Supaporn Kiattisin CJ. Stroke risk prediction model based on demographic data. In 2015 8th
Biomedical Engineering International Conference (BMEiCON). p. 1-3. IEEE. [DOI: 10.1109/BMEiCON.2015.7399556].
Fekadu G, Chelkeba L, Kebede A. Retraction Note: Risk factors, clinical presentations and predictors of stroke among adult patients admitted to stroke unit of Jimma university medical center, South west Ethiopia: Prospective observational study. BMC Neurol 2019;19:327.
EBSCOhost 137139786 Predictors for stroke mortality. A Comparison of the Oslo-Study 1972/73 and the Oslo II-Study in; 2000.
Nouh AM, McCormick L, Modak J, Fortunato G, Staff I. High mortality among 30-day readmission after stroke: Predictors and etiologies of readmission. Front Neurol 2017;8:632.
Béjot Y, Cordonnier C, Durier J, Aboa-Eboulé C, Rouaud O, Giroud M. Intracerebral haemorrhage profiles are changing: Results from the Dijon population-based study. Brain 2013;136:658-64.
Giroud M, Delpont B, Daubail B, Blanc C, Durier J, Giroud M, et al
. Temporal trends in sex differences with regard to stroke incidence: The Dijon stroke registry (1987-2012). Stroke 2017;48:846-9.
Willey JZ, Moon YP, Paik MC, Boden-Albala B, Sacco RL, Elkind MS. Physical activity and risk of ischemic stroke in the Northern Manhattan Study. Neurology 2009;73:1774-9.
[Table 1], [Table 2], [Table 3]