My first PhD project built on my previous work of analyzing electroencephalography (EEG) data and introduced me to machine learning for the first time. During this project, I shifted focus from traditional EEG data analysis to data science and machine learning. Working on this project gave me a firsthand understanding of how complex real-world data can be and showed me the importance of the context in which data is collected and used.

The results of our research were published in a paper titled “Predicting epileptic seizures using nonnegative matrix factorization”. The accompanying code can be found here.

EEG for seizure prediction

EEG is a powerful tool for diagnosing and monitoring epileptic seizures. Seizure prediction algorithms focus on classifying the “normal” and “abnormal” EEG signals minutes before a seizure.

Patients need an accurate algorithm that doesn’t miss seizures (low rate of false negatives). Medical professionals and researchers are interested in learning about disease dynamics, understanding the model’s behavior, and using the prediction model to help patients during or after medical procedures. Further, predicting epileptic seizures can inform high stakes decisions. In this setting, interpretability is important since the consequences of wrong prediction can lead to wrong treatments and distrust of patients. There is also a question of liability for the decisions made based on their predictions when using seizure prediction methods in a clinical setting.

In an EEG signal, seizures are distinguished by sudden hyper synchronization. There are four main segments of EEG signals in epilepsy, which correspond to four states patients can be in:

interictal - a state between seizures (Figure 1),
preictal - a state before a seizure (Figure 1),
ictal - a state during a seizure,
postictal - a state after a seizure.

Figure 1: Example spectrograms of preictal and interictal states. Baseline-corrected spectrograms from a single EEG channel of one patient, showing a preictal state (A) and an interictal state (B).

Challenges in model development

There are several data- and model-related challenges when designing seizure prediction models. First, collecting quality data on epileptic seizures in a clinical setting is challenging, expensive, and can take a long time. Not every hospital has the equipment or resources to do it, and there are ethical considerations. Seizures are rare events, and patients often have to undergo surgery, but we need to collect a large amount of data containing both interictal and preictal states. The EEG measurements have to be long enough to get sufficient data, but not so long that the conditions of patients worsen before the surgery.

This leads to heterogeneous and imbalanced data, with more interictal recordings than preictal or ictal states collected per patient (who often have different diagnoses) and different EEG settings.

Because of the limitations of the data collection process, developed models for seizure predictions must be patient-specific. This further limits the amount of available data and makes it hard to generalize the dynamics of epilepsy. Seizure prediction models also have to be able to deal with imbalanced datasets while preventing overfitting. Because of these challenges, only a few large datasets of EEG recordings of epileptic seizures are available for broader use. The two biggest and most often used ones are the EPILEPSIAE dataset and the Epilepsyecosystem dataset.

Seizure prediction methods usually combine features derived from EEG signals (e.g., spectral features, EEG patterns, etc.) with machine learning algorithms such as SVM, decision trees, logistic regression, time-series analysis methods, or neural networks. For a comprehensive review of this topic, please refer to “EEG datasets for seizure detection and prediction— A review” and “The present and future of seizure detection, prediction, and forecasting with machine learning, including the future impact on clinical trials”.

Our model

We wanted to design an interpretable model and started by designing interpretable features of preictal and interictal states, which we used to classify between the states. To do so, we extracted the time and frequency components of intracranial EEG signals for each channel of each state for each patient using nonnegative matrix factorization (see Figure 2), which is an interpretable and transparent decomposition method. The components capture the dominant information from power spectra and detect structure in preictal states, which we use for classification. Learned time and frequency components are also informative for domain experts because they can be related to well-understood physical phenomena. We combined two major datasets, EPILEPSIAE and Epilepsyecosystem, to ensure that our model is robust.

Figure 2: Creating a time-frequency model using NMF components. The red and blue solid lines represent the frequency and time components, while the dashed lines show their respective models. The time-frequency model in the center is created by combining the modeled time and frequency components.

We combined a linear support vector machine with L1 regularization for classification. Since SVM is not an algorithmically transparent classifier, we compensated for this by using L1 regularization to select informative EEG channels and weigh their contribution to the prediction. Combining these methods makes it possible to look at individual measurements, their NMF components, and the learned weights of L1 regularization. This way, we can see why the classifier assigns one of the two classes to a particular measurement and achieves local model interpretability. As the last step, we applied the synthetic minority over-sampling technique (SMOTE) to mitigate the class imbalance of interictal over preictal states. Our method produces good results and is computationally inexpensive, which could lend itself to an application in a closed-loop setting.

Predicting Epileptic Seizures

EEG for seizure prediction

Challenges in model development

Our model

Predicting the Spread of Infectious Diseases

My Theses: Bachelor, Master, and PhD