Modeling Day-Long ECG Signals to Predict Heart Failure Risk withExplainable AI

Joachim Behar
1 day ago
4 min read

Eran Zvuloni, Ronit Almog, Michael Glikson, Shany Brimer Biton, Ilan Green, Izhar Laufer, Offer Amir, and Joachim A. Behar

The collaborating institutions are Technion-Israel Institute of Technology, Rambam Health Care Campus, Shaare Zedek Medical Center, Leumit Health Services, Hadassah Medical Center, and the Hebrew University Faculty of Medicine.

URL: XXX

New publication from the Technion AIMLab introduces DeepHHF, an explainable deep learning model that uses 24-hour single-lead Holter ECG recordings to predict five-year heart failure risk.

Heart failure is a major and growing public health challenge. It affects millions of people worldwide, reduces quality of life, and is one of the leading causes of hospitalization among older adults. Yet heart failure often develops gradually, creating an important window of opportunity: identifying high-risk individuals earlier could support closer follow-up, preventive care, and better allocation of clinical resources.

In our new publication, “Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI,” we ask a simple but powerful question: Can the electrical activity of the heart, recorded continuously over 24 hours, reveal who is at increased risk of developing heart failure years later?

To answer this, we developed Deep Holter Heart Failure, or DeepHHF, a deep learning model designed to analyze long-duration Holter ECG recordings. Unlike standard short ECG snapshots, Holter monitoring captures the heart over an entire day, making it possible to observe intermittent and paroxysmal events that may be missed in shorter recordings.

Why Holter ECG?

Most AI-ECG research has focused on short 12-lead ECGs, typically lasting only a few seconds. These recordings are clinically valuable, but they provide only a brief snapshot of cardiac physiology. Holter ECGs, by contrast, are already used in routine care to investigate symptoms such as palpitations, syncope, and suspected arrhythmias.

This makes them an attractive source of additional information: the same examination could potentially be used not only for rhythm assessment, but also for opportunistic risk prediction.

DeepHHF was developed to work with single-lead 24-hour ECG data, aligning with the growing availability of wearable and patch-based ECG devices.

A large real-world dataset

The study used the Technion-Leumit Holter ECG dataset, which includes 69,663 Holter recordings from 47,729 patients, collected across 20 primary care facilities in Israel and linked to approximately 20 years of longitudinal clinical data.

After applying exclusion criteria, the final study cohort included 57,575 Holter recordings from 40,174 unique patients. Each Holter recording was labeled according to whether the patient received a first documented heart failure diagnosis within five years after the examination. Recordings from patients with heart failure already documented before the Holter examination were excluded, ensuring that the task focused on future risk prediction rather than detection of known disease.

How DeepHHF works

DeepHHF was trained in two stages. First, an encoder learned representations from 30-second ECG windows. Then, a sequential transformer-based head modeled information across the full 24-hour recording, producing a patient-level score for five-year heart failure risk.This design allowed the model to combine local ECG morphology with long-range temporal information from the full Holter recording.

Key findings

DeepHHF achieved an AUROC of 0.80 for predicting five-year heart failure risk from 24-hour Holter ECG recordings.

This outperformed both a model based on a single 30-second ECG window, which achieved an AUROC of 0.77, and the PCP-HF clinical risk score, which achieved an AUROC of 0.74 on the available test-set subset.

The added value of modeling the full 24-hour signal was especially important: continuous recordings can capture rhythm abnormalities and temporal patterns that short ECG segments may miss.

The model also showed clinically meaningful risk stratification. Patients identified as high risk by DeepHHF had a fourfold increased likelihood of all-cause mortality and a twofold increased likelihood of cardiac/internal department hospitalization or death compared with patients in the low-to-moderate risk groups.

Importantly, the model’s performance remained stable even when evaluated across different time intervals before heart failure diagnosis, supporting its potential as a long-term prognostic tool rather than simply a detector of imminent disease.

Making AI more interpretable

Using gradient attention rollout, we investigated which parts of the ECG contributed most to the model’s predictions.DeepHHF focused on ECG patterns associated with arrhythmias and abnormal cardiac activity. Clustering of highly attended beats revealed morphologies resembling premature ventricular contractions, supraventricular ectopy, and beats with absent P-waves. This is important because clinical AI models should not only perform well; they should also provide insight into the physiological patterns they use.

External validation

DeepHHF was further evaluated on an independent external cohort from Rambam Health Care Campus in Haifa, Israel. In this zero-shot setting, without fine-tuning, the model achieved an AUROC of 0.81 for predicting future heart failure, supporting its potential generalizability.

Clinical vision

The long-term vision is not to replace clinicians, but to augment routine care. A patient may undergo Holter monitoring for standard clinical reasons, such as suspected arrhythmia. DeepHHF could then run in the background and provide an additional heart failure risk score. Patients identified as moderate or high risk could be prioritized for preventive evaluation, such as BNP testing, echocardiography, closer monitoring, or targeted intervention.

Because the model uses only a single-lead ECG and does not require additional clinical variables, it could potentially be integrated into Holter analysis software or future wearable ECG platforms.

Looking ahead

The study also has limitations. Further validation is needed across additional centers, populations, devices, and acquisition protocols. The model predicts overall incident heart failure risk and does not distinguish between heart failure phenotypes such as HFrEF, HFmrEF, and HFpEF. Future work will focus on broader validation, improved calibration, device generalization, and deeper investigation of temporal ECG patterns.

Still, this work demonstrates the promise of modeling day-long physiological signals with explainable AI. By listening to the heart over time, rather than through a brief snapshot, we may uncover new opportunities for earlier risk detection and more proactive cardiovascular care.