ABSTRACT
Objective
This study compares the effectiveness of machine learning and deep learning models in predicting hemodialysis patients’ length of stay in the intensive care unit (ICU).
Methods
This retrospective cohort study used data from 980 poisoned patients undergoing hemodialysis. A variety of eight well-known machine learning [support vector machine, extreme gradient boosting, random forest (RF), decision tree] and deep learning (deep neural network, feedforward neural network, long short-term memory, convolutional neural network) models were employed.
Results
Feature importance analyses using Shapley Additive exPlanation and local interpretable model-agnostic explanation methodologies identified Glasgow Coma Scale (GCS <8), intubation, acute kidney injury, PO2, blood urea nitrogen, metabolic acidosis, and number of hemodialysis sessions as key predictors of ICU stay duration in poisoned hemodialysis patients, with intubation score, GCS score, and ICU admission type being the most significant predictors. Overall, the RF model displayed exceptional performance across various metrics.
Conclusion
Our findings emphasize the importance of neurological status, respiratory function, and renal injury in predicting ICU duration, offering valuable insights for clinical decision-making and resource allocation in this high-risk population.
INTRODUCTION
Poisoning, whether intentional or accidental, is a significant health issue worldwide, imposing substantial financial, physical, and mental burdens on patients, families, and society (1, 2). According to the World Health Organization (WHO), over 3 million people are poisoned annually, resulting in 220,000 deaths, mostly in developing countries due to easy access to toxic substances, lack of awareness, and limited hospital resources (3, 4). Poisoning accounts for over 2.4% of emergency department visits and 3-6% of intensive care unit (ICU) admissions (5, 6). While most poisoned patients recover with supportive treatment, critically ill patients with severe symptoms are admitted to ICUs for intensive care and monitoring. Studies show that 4.1-30.8% of life-threatening poisoning cases require ICU admission. In developing countries, including Iran, poisoning incidents, especially from pesticides and rice tablets, have doubled in recent decades despite improved ICU facilities (7, 8).
Hemodialysis patients are frequently admitted to the ICU due to poisoning, with about 2% of chronic dialysis patients requiring ICU care annually. They are at increased risk of infections due to factors like immune deficiency from uremia, impaired phagocytic function, older age, and comorbidities such as diabetes. Frequent use of vascular access for hemodialysis heightens the risk of bloodstream infections (9). Managing poisoned hemodialysis patients in the ICU is challenging due to their altered pharmacokinetics and pharmacodynamics, which affect toxin clearance. Renal failure complicates toxin elimination, often necessitating hemodialysis. Additionally, these patients are prone to fluid and electrolyte imbalances, requiring close monitoring. Infections, especially bloodstream infections from vascular access, are a major concern for immunocompromised patients (10). Comprehensive ICU management for these patients involves toxin removal, careful renal replacement therapy, fluid balance, electrolyte management, and infection control.
Moreover, healthcare systems face constant pressure to improve patient outcomes and reduce costs. ICUs, which provide critical care, are expensive and resource-intensive (11, 12). The increasing number of poisoned patients has heightened the demand for ICU beds. The WHO highlights the importance of monitoring the length of stay (LOS) as a measure of care quality and resource use (13, 14). Patients with prolonged LOS consume a significant portion of resources, so reducing LOS can enhance bed turnover, optimize resource allocation, improve patient safety, and lower costs. Identifying patients with long LOS, particularly in overwhelmed hospitals, can alleviate pressure and boost ICU efficiency. Policymakers are adopting evidence-based solutions to optimize ICU resources like beds, staff, and mechanical ventilation (15-17).
Recently, artificial intelligence-based solutions such as machine learning (ML) and deep learning (DL) have gained further attention for their ability to predict the outcome of interest based on a great amount of data, especially when the relationships between variables are complex and non-linear (18-20). DL and ML models can be used for predicting the patients’ ICU LOS (21, 22). These models leverage large datasets comprising patient demographics, clinical variables, and possibly real-time monitoring data such as vital signs and laboratory results (21). By analyzing patterns and correlations within these data, DL and ML algorithms can generate accurate predictions of how long a patient is likely to stay in the ICU (22, 23). This predictive capability not only helps in optimizing resource allocation and bed management but also assists healthcare providers in identifying patients at risk of prolonged ICU stays early on, enabling proactive interventions and potentially improving patient outcomes (21, 24).
To the best of our knowledge, no study has compared ML models with DL models for predicting ICU stay duration for hemodialysis patients with poisoning. Therefore, the purpose of this study is to develop and evaluate the effectiveness of both ML and DL models in predicting the ICU LOS for hemodialysis patients suffering from poisoning. By comparing the performance of these models, we aim to identify the most accurate and reliable approach, ultimately improving resource allocation and patient outcomes in critical care settings.
METHODS
Study Design and Setting
This retrospective cohort study was performed among all poisoned patients who were admitted to the ICU at the Loghman Hakim Hospital (LHH) between January 1, 2016, and December 31, 2020. The hospital, which is known as the Iran’s largest poison center, is crucial for managing patients who need specialized care as a result of various poisonings and toxic exposures. The ICU of LHH is essential to the management of critical situations, providing cutting-edge medical techniques and specialized care to guarantee the best outcomes for patients. In this study, the LOS of patients who were admitted to the ICU was analyzed. Patients were categorized into two groups based on the LOS: short (lasting 4 days or less), and long (exceeding 4 days). We utilized several data-driven ML and DL models to develop an accurate prediction model the ICU LOS in poisoned patients undergoing hemodialysis. The key steps taken were as follows (Figure 1):
- establishing the study roadmap and experiment environment,
- preprocessing the data,
- using feature selection algorithms,
- selecting appropriate classification algorithms and their hyperparameters,
- splitting the data into training and testing sets, and
- evaluating model performance.
Data Collection and Preprocessing
The data set was collected by reviewing the electronic medical records (EMRs) of patients undergoing hemodialysis who were poisoned admitted to the ICU at LHH between January 2016 and December 2020. Relevant variables were extracted from poisoned patient records and entered into a database. Age, sex, kind of poisoning, history of underlying conditions, medication usage and habits, laboratory test results, and vital signs, number of extracorporeal technique uses, type of extracorporeal method, and patient outcome were among the variables under investigation (Table 1).
Statistical Analysis
In this study, before further analysis and feeding data into ML methods, the rows of datasets collected from EMRs laboratory tests, underwent a series of preprocessing steps. These steps include; any rows with over 70% missing values were removed. Patient data were excluded if they were missing critical demographic or clinical information (e.g., comorbidities) necessary for analysis. Additionally, duplicate records and entries with inconsistent or implausible values (e.g., negative age or erroneous laboratory results) were excluded to ensure data quality. Missing values for remaining variables were handled using imputation techniques where feasible, and rows with irreparable missingness for key variables were removed. Next, minimum-maximum scaling was applied to normalize all values between 0 and 1. Standard scaler scaling was then used to standardize the data distribution. Data validation checked the integrity and accuracy of the dataset. To handle class imbalance, under-sampling, balanced the classes by keeping all samples from the minority class.
After preprocessing, the final dataset contained 980 patients. This dataset was randomly split so that 70% of the data (686 patients) was assigned to the training set, and the remaining 30% (294 patients) was assigned to the test set. The training set was used to develop the feature selection and ML models, while the test set was held out for model evaluation. The descriptive statistics of the variables in the dataset are shown in Table 1. This includes variable names, the frequency of each variables, and their values.
Model Development and Evaluations
The LOS in the ICU for hemodialysis patients with poisoning was predicted using eight well-known models from the domains of DL and ML. Convolutional neural networks (CNN), feedforward neural networks, long short-term memory (LSTM), and deep neural networks (DNN) were among the DL models. The ML models included random forest (RF), decision tree (DT), support vector machine, and extreme gradient boosting (XGB).
Cross-validation and Tweaking of Hyperparameters
We trained all suggested models using 10-fold cross-validation to reduce overfitting. The dataset is divided into ten equal segments using this method. The model is trained on nine segments and validated on the remaining segments in each iteration. To ensure complete validation, this procedure is carried out ten times. A robust assessment of the model’s overall performance is provided by the final performance metric, which is produced by averaging the outcomes from each iteration (25).
In addition to 10-fold cross-validation, we implemented other regularization techniques such as L2 regularization and dropout to further reduce the risk of overfitting. L2 regularization adds a penalty to the loss function based on the magnitude of the model’s coefficients, discouraging overly complex models. Furthermore, we monitored training and validation loss during the training process to detect signs of overfitting early. Early stopping was employed to halt training if the validation loss did not improve for a pre-defined number of epochs, thus preventing the model from overfitting to the training data.
We also performed hyperparameter tuning to improve the performance of each method. Using a grid search approach, we methodically assessed a large number of hyperparameter values. Finding the parameter configurations that maximize each model’s accuracy and efficiency was the goal. We successfully adjusted the models because of this exhaustive and repetitive analysis of the hyperparameter space. Our models’ capacity to assess and forecast outcomes using the provided dataset is greatly improved by this meticulous calibration (26).
Justification and Explanation of the Machine Learning and Deep Learning Models’ Output
Because of their intricate and opaque internal workings, ML and DL techniques are commonly referred to as “black box” models (27, 28). This intricacy frequently leads to a lack of interpretability, which can be especially troublesome in crucial domains, like healthcare, where comprehending the reasoning behind forecasts is essential. Researchers have been creating methods to improve the interpretability of these models in order to address this problem. Shapley Additive exPlanations (SHAP), first presented by Lundberg and Lee (29), is a well-known technique that has gained popularity recently. By utilizing the idea of Shapley values from cooperative game theory, SHAP seeks to clarify the predictions of ML models. Because of its ability to yield insightful information on model predictions, this method has become widely accepted and used in a variety of fields, including clinical research (30, 31).
In our study, we incorporated SHAP to interpret the outputs of our ML and DL models. By applying SHAP, we were able to break down the predictions into contributions from each feature, offering a clear and detailed understanding of how each variable influenced the model’s decisions. This transparency is particularly valuable in healthcare applications, as it allows clinicians to trust and verify the predictions made by the models. Furthermore, we complemented SHAP with other interpretability techniques, such as local interpretable model-agnostic explanations (LIME) and feature importance analysis, to provide a multifaceted view of model behavior. These methods together helped us ensure that our models were not only accurate but also interpretable and trustworthy.
By integrating these interpretability methods, we aimed to bridge the gap between model complexity and usability, ensuring that our ML and DL models can be effectively deployed in real-world healthcare settings, where understanding and trust are paramount. This comprehensive approach to interpretability underscores our commitment to developing reliable and transparent predictive models that can aid in critical decision-making processes.
Model Performance Evaluation
Performance measures obtained from the confusion matrix were used to thoroughly assess the efficacy of both ML and DL models, as shown below. Key performance indicators such as accuracy, specificity, sensitivity, F1-score, and the receiver operating characteristic (ROC) curve were used in a thorough evaluation of the predictive models.
1) Accuracy = TP + TN / TP + TN + FP + FN * 100
2) Sensitivity = Tp / TP + FN * 100
3) Sensitivity = TN / TN + FP * 100
5) f - measure = 2 precision*sensitivity / precision+ sensitivity
Ethical Considerations
For this study, ethical approval was obtained from the ethics committee at Shahid Beheshti University of Medical Sciences (approval no: IR.SBMU.RETECH.REC.1401.767, date: 12.02.2023). Due to the study’s non-invasive methodology and strict adherence to patient anonymity and data confidentiality, the ethics committee granted a waiver for written informed consent. This waiver ensured the collection of confidential data without any identifying information. Access to the data was limited to the research team, thereby mitigating any potential risk to patients in accordance with the study’s protocols.
RESULTS
Patients’ Characteristics
This retrospective observational study examined data from 68,181 patients hospitalized for poisoning over a specified time. After applying exclusion criteria, 980 (1.4%) underwent hemodialysis. The cohort included 793 (80.9%) males and 187 (19.1%) females with a mean age of 36.5±14 years. The age distribution significantly differed, with most patients (604, 61.6%) aged 21-40 years, (p<0.001). Intentional poisoning accounted for 117 (11.9%) cases. Methanol was the most common poisoning agent (858, 87.6%), followed by multidrug ingestions. The majority (830, 84.7%) had no prior history of kidney disease, and 903 (92.1%) had no prior drug use. However, 627 (64%) admitted alcohol use. Hemodialysis was the most widely used extracorporeal method (980 cases, 99.1%). Hemoperfusion was additionally used to treat 9 patients poisoned by methanol, multidrug ingestions, or methadone. This study characterized the demographics, toxins, and extracorporeal treatment approaches for a large cohort of poisoned patients requiring hemodialysis. Figure 2 shows the cause of intoxication in the studied patients.
Hyperparameters Tuning
Table 2 presents the tuned hyperparameters of four ML algorithms.
Performance Evaluation of Selected Models
Deep Learning Models
Table 3 presents the performance assessment of selected models. According to the findings in this table, the best DL model identified is the CNN with a sensitivity of 82%, specificity of 83%, accuracy of 95%, and F1-score of 82%. The best ROC value was associated with the DNN model.
Machine Learning Models
Overall, the performance of the RF model was superior to all other models, with a sensitivity of 92%, an accuracy of 98%, an F1-score of 98%, and an ROC score of 98%. However, the XGB model also achieved a sensitivity of 92%, an accuracy of 98%, and an ROC score of 98%. Additionally, the DT model achieved a sensitivity of 92% and an accuracy of 98%.
Deep Learning vs. Machine Learning Models
In terms of sensitivity, DL models performed from 80.0% to 82.0%, whereas ML models achieved sensitivities between 80.0% and 92.0%. The specificity of DL models ranged from 78.0% to 83.0%, whereas ML models achieved specificities between 84.0% and 94.0%. DL models had an accuracy ranging from 94.0% to 95.0%, compared to ML models, which ranged from 95.0% to 98.0%. The F1-scores for DL models were between 79.0% and 82.0%, while ML models ranged from 82.0% to 95.0%. Additionally, DL models achieved ROC scores between 97.0% and 98.0%, whereas ML models had ROC scores ranging from 95.0% to 98.0%.
Overall, while the RF model demonstrated exceptional performance across various metrics, other ML models showed superior performance compared to DL models across most measures.
Figure 3 shows the performance of various ML and DL models in predicting ICU stay duration for hemodialysis patients with poisoning, while Figure 4 compares the ROC curves of these models.
Explanation and Justification the Output of Machine Learning and Deep Learning Models
Shapley Additive exPlanations (SHAP)
The SHAP summary plot shows the impact of each feature on the model’s output, with higher values indicating a greater positive impact and lower values indicating a greater negative impact (Figure 5). For example, the feature “intubation” has a positive impact on the model’s output, while the feature “blood glucose” has a negative impact. The most important features for predicting the duration of ICU stay among hemodialysis patients suffering from poisoning were intubation score, GCS score, and ICU admission type.
Local Interpretable Model-agnostic Explanations (LIME)
Figure 6 illustrates LIME plots depicting the cumulative influence of essential features on the model’s prediction of ICU stay duration for hemodialysis patients with poisoning. The most influential features were GCS <8, intubation, acute kidney injury (AKI), PO2, blood urea nitrogen (BUN), metabolic acidosis, and number of hemodialysis sessions.
DISCUSSION
In this study, we aimed to predict the ICU LOS for hemodialysis patients with poisoning by comparing the performance of various ML and DL models. The importance of timely prediction and intervention in poisoning cases cannot be overstated, as delayed treatment can lead to severe complications or even death. Our findings demonstrated that the RF model outperformed all other models, achieving sensitivity, accuracy, F1-score, and an ROC score. In comparison, the best-performing DL model, the CNN, achieved a sensitivity, specificity, accuracy, and an F1-score. Overall, ML models exhibited higher sensitivities, specificities, accuracies, and F1-scores than DL models, indicating that, especially the RF model, they are more effective for predicting ICU stay duration for this patient population.
As mentioned above, our results showed that the RF model surpassed all ML and DL models, achieving superior sensitivity, accuracy, F1-score, and ROC score. In contrast, the best-performing DL model, the CNN, attained notable sensitivity, specificity, accuracy, and F1-score. The RF model and the CNN are pivotal in our study due to their exemplary performance and complementary strengths, which highlight the potential of both ML and DL in medical predictions. The RF model’s superior sensitivity, accuracy, F1-score, and ROC score underscore its robustness and reliability in predicting ICU stay duration for hemodialysis patients with poisoning. This high performance can be attributed to the RF model’s ability to handle diverse data inputs and reduce overfitting through its ensemble learning approach, making it a powerful tool for clinical decision-making. On the other hand, the CNN’s impressive sensitivity, specificity, accuracy, and F1-score demonstrate the advanced capabilities of DL in capturing complex patterns within data. The CNN’s performance highlights its potential for applications where intricate data structures and high-dimensional features are present. Together, these models illustrate the importance of leveraging both traditional ML techniques and advanced DL methods to achieve optimal predictive accuracy and clinical relevance, ultimately improving patient outcomes through precise and timely interventions. Huang et al. (32) study revealed that out of all the algorithms examined, the RF and ensemble methods exhibited superior predictive performance. The study indicated that RF is particularly effective for predictive modeling of blood pressure during hemodialysis (32). Other studies (33, 34) have also shown that CNN can be effectively used to predict hemodialysis.
Our study findings indicate that ML models exhibited higher sensitivities, specificities, accuracies, and F1-scores than DL models, suggesting that particularly the RF model is more effective for predicting ICU stay duration for this patient population. These results underscore the potential of ML techniques in clinical decision-making processes for poisoned hemodialysis patients. The importance of comparing ML and DL models in our study lies in their distinct strengths and applications within predictive modeling for medical outcomes. ML models have shown superior sensitivities, specificities, accuracies, and F1-scores compared to DL models, highlighting their effectiveness in handling structured data and generating precise predictions. This superiority is crucial in clinical settings where accurate predictions can significantly impact patient care and outcomes. ML models, such as RF, excel in interpreting relationships between input variables and outcomes, making them valuable tools for predicting ICU stay durations and guiding timely interventions for poisoned hemodialysis patients (32). Conversely, DL models like CNN and LSTM offer advantages in capturing intricate patterns from complex, unstructured data, although in our study, their performance metrics were comparatively lower. By understanding and leveraging the strengths of both ML and DL approaches, healthcare practitioners can enhance their predictive capabilities and ultimately improve patient care and treatment outcomes in critical medical scenarios.
Jordan and Mitchell (35) mentioned that ML addresses these limitations by enhancing a computer program’s performance through experience with specific tasks and performance measures. Essentially, ML aims to automate analytical model building for cognitive tasks like object detection or natural language translation. This is accomplished by applying algorithms that iteratively learn from training data, enabling computers to uncover hidden insights and complex patterns without explicit programming (36). By learning from past computations and identifying regularities in large datasets, ML can facilitate reliable and repeatable decision-making. Consequently, ML algorithms have been successfully applied in numerous domains, including fraud detection, credit scoring, next-best offer analysis, speech and image recognition, and natural language processing (36).
In our study, we have identified several crucial prognostic factors that warrant further discussion. Our findings represent a significant advancement in the field, as previous research has predominantly focused on overall hospital LOS, rather than specifically addressing ICU duration in poisoned patients requiring hemodialysis. The most influential features identified in our study were GCS <8, intubation, AKI, PO2, BUN, metabolic acidosis, and number of hemodialysis sessions. Among these, Intubation score, GCS score, and ICU admission type emerged as the paramount predictors of ICU stay duration. These results align with and expand upon the findings of Rahimi et al. (37), who reported that intubation, GCS, and ICU admission were significant prognostic factors in poisoned patients undergoing hemodialysis. Furthermore, the study by Brenner et al. (38) on arteriovenous access failure highlights the impact of complications on hospitalization duration and costs. This emphasizes the need for meticulous vascular access management in poisoned hemodialysis patients to potentially reduce the duration of ICU stays. Additionally, the study by Yan et al. (39) on racial and ethnic disparities in hospitalization rates among hemodialysis patients introduces an important dimension worthy of exploration in future research on ICU stay duration. While our current study did not specifically address these demographic factors, their potential influence on ICU outcomes in poisoned hemodialysis patients merits further investigation. Overall, this section of our study represents a significant advancement in understanding the determinants of ICU stay duration for poisoned patients requiring hemodialysis. By focusing on specific ICU-related outcomes, we provide valuable insights that can inform clinical practices and improve patient management strategies in this critical patient population. Future research should aim to incorporate demographic factors and explore their potential impact on ICU outcomes to build a more comprehensive understanding of the determinants influencing ICU stay duration in poisoned hemodialysis patients.
Study Limitations
The study has several limitations, including the single-center data from Loghman Hakim Hospital, which may affect generalizability, and the sample size of 980 patients, which may not capture all variability in ICU stay duration for hemodialysis patients with poisoning. Future research should aim to include multicenter data to enhance generalizability and increase sample size to capture a broader range of variability.
Additionally, this study acknowledges that the models were developed based on static data, which may limit their applicability to real-time clinical decision-making. Future research should focus on testing and validating these models in dynamic, real-time clinical environments to assess their practical utility and performance in such settings.
Moreover, this study acknowledges potential biases introduced by the class imbalance in the dataset, particularly the higher prevalence of short LOS cases compared to long LOS cases. Although under-sampling techniques were applied to address this issue, such methods might not fully eliminate the inherent bias, potentially impacting the generalizability and robustness of the model’s predictions for underrepresented classes. To further address class imbalance, methods such as SMOTE, class weighting, hybrid sampling, or ensemble approaches could be employed, alongside emphasizing evaluation metrics like F1-score and precision-recall-area under the curve.
Lastly, this study acknowledges that temporal trends, such as changes in hospital practices or treatments between 2016 and 2020, were not explicitly accounted for in the analysis. Such trends could impact the generalizability of the findings, as variations in clinical protocols, resource availability, or treatment methods over time might influence the outcomes. Future studies should consider incorporating temporal stratification or modeling to assess the potential impact of these trends and minimize their confounding effects on the results.
CONCLUSION
This study aimed to predict the ICU LOS for hemodialysis patients suffering from poisoning, by comparing ML models with DL models. The results showed that among DL models, the CNN performed best with sensitivity, specificity, accuracy, and F1-score values, while the DNN achieved the best ROC value. However, the RF model, an ML model, outperformed all other DL and ML models, achieving higher scores across sensitivity, accuracy, F1-score, and ROC metrics.
In general, ML models demonstrated superior performance compared to DL models, with higher sensitivities, specificities, accuracies, and F1-scores. These findings suggest that ML models, particularly the RF model, are more effective for predicting ICU stay duration for hemodialysis patients with poisoning. Therefore, incorporating ML models into clinical practice could enhance the prediction and management of ICU stay durations in this patient population, potentially improving outcomes and resource allocation.