Data Science & Soft Computing Lab

Data Science & Soft Computing Lab

Full Members

Dr Daniel Stamate, Lab lead, Data Scientist, Goldsmiths, and University of Manchester

Prof Fionn Murtagh, Data Scientist, Goldsmiths

Dr Ida Pu, Computer Scientist, Goldsmiths

Alexandra Stepanenko, Accelerated Knowledge Transfer - AKT Research Associate, Goldsmiths

Mihai Ermaliuc, PT PhD Candidate Data Science at Goldsmiths, working in Neural Networks, Generative Adversarial Networks, and Large Language Models

Henry Musto, PT PhD Candidate in Data Science, working in predicting Dementia with Survival and Classification Statistical and Machine Learning models on ADNI and ELSA cohorts

Mohamed Saber, PT PhD Candidate Data Science at Goldsmiths, working in Financial Fraud Detection

Jiri Marek, PT PhD Candidate Data Science at Goldsmiths, working in Behavioural Finance

Associated Members

Prof Daniel Stahl, Professor Medical Statistics and Statistical Learning, Lead of Precision Medicine and Statistical Learning Group, Institute of Psychiatry, Psychology and Neuroscience, King's College London

Prof Doina Logofatu, Computer Scientist and Mathematician, Frankfurt University of Applied Sciences

Prof Mihaela Breaban, Computer Scientist, University of Iasi

Dr Olesya Ajnakina, Senior Data Scientist and Statistician, King's College London

Mr Frederic Marechal, Data Scientist in industry

Dr Charlotte Wu, Strategic Physician Leader in health systems & technology innovation, Harness Health Partners

Data Science MSc Students interns

Eoin Houstoun

Former Members

Asei Akanuma (KTP Research Associate), John Langham (Research Associate), Alexandra Stepanenko (AKT Research Associate), Wajdi Alghamdi (PG Researcher), Raph Olaniyan (PG Researcher), Karolina Rutcowska (Research intern), Andrea Katrinecz (Research intern), Jeremy Ogg (Research intern), Pedro Lopez (Research intern), Gabriel Burcea (Research intern), Rostislav Vorobev (Visinting researcher), Ruslan Tsygankov (Visiting researcher), Rubaida Easmin (Research intern), Mazy Carneiro (Research intern), Gozde Orhan (Research intern), Esperanza Ballesteros (Research intern), Markela Zeneli (Research intern), Prad Sree Davuloori (Research intern), Riya Haran (Research intern)

Main Research Directions

1. Machine Learning Prediction Modelling in Mental Health

2. Identifying and Measuring Playful Parenting Using Machine Learning

3. Predicting Spectral Reflectance Curves and Applications in Coatings Industry

4. Machine Learning and NLP Sentiment Analysis in Finance

5. Soft Computing, Evolutionary Algorithms and Applications

1. Machine Learning Prediction Modelling in Mental Health

(1.a) Predicting Risk of Dementia with Machine Learning using Routine Primary Care Records – CPRD.
Participants: Daniel Stamate, Fionn Murtagh, Mihai Ermaliuc, John Langham, Charlotte Wu, in collaboration with Prof David Reeves and team at the Centre for Primary Care in the Institute for Population Health, University of Manchester

Our Lab leads on the Machine Learning aspects of the study based on our project on Predicting the risk of dementia using routine primary care records, which is developed in collaboration with University of Manchester and other academic partners. The project got media coverage at BBC. The research work concerns the development of novel synergistic approaches to predicting dementia based on Machine Learning (AI) and Statistical methods, and the development of a prediction tool. There are currently almost 1 million people in UK living with dementia. There is currently no cure, and the condition has higher health and social care costs than cancer, stroke and chronic heart disease, taken together (dementia cost in UK being £26 billion per year). Current thinking suggests that 35% of cases of dementia could be prevented. Our research project aims to contribute to prevention, and to helping improve diagnosis rates (currently at least one third of expected patients don't receive a dementia diagnosis) through predicting risk of dementia with new machine learning and statistical based approaches. The main source of data to be analysed in this project is the Clinical Practice Research Datalink (CPRD).

(1.b) Predicting Alzheimer's and Dementia with Machine Learning and Statistical Approaches on ADNI, EMIF-AD and ELSA cohorts.
Participants: Daniel Stamate, Daniel Stahl, David Reeves, Henry Musto, Rostislav Vorobev, Ruslan Tsygankov, Olesya Ajnakina, in collaboration with Institute of Psychiatry London - King's College London, UCL, Oxford University, EMIF-AD Consortium partners, and University of Manchester

This topic involves predicting Alzheimer's Disease (AD) and Dementia with innovative Machine Learning and Statistical Learning methodologies on:

i) Alzheimer's Disease Neuroimaging Initiative: ADNI with methodologies based on Neural Networks and Deep Learning, Gradient Boosting, Gaussian Processes, SVM, and Survival Machine Learning
ii) European Medical Information Framework - Alzheimer's Disease: EMIF-AD (AD Biomarker Discovery), with methodologies based on Gradient Boosting Machines, Random Forests and Deep Learning
iii) English Longitudinal Study of Ageing – ELSA, with methodologies based on Survival Random Forests, Survival Elastic Net and Cox models, and Gradient Boosting classification.

(1.c) AI for Predicting Psychosis
Participants: Daniel Stamate, Daniel Stahl, Wajdi Alghamdi, Andrea Katrinecz, in collaboration with Institute of Psychiatry, Psychology & Neuroscience, King's College London, Department of Psychiatry and Neuropsychology Maastricht University Medical Centre, and Department of Psychiatry, Yale University School of Medicine

Prediction Modelling and Pattern Detection Approaches for the First-Episode Psychosis Associated to Cannabis Use
Recent studies show that cannabis is one of the most popular drugs in the world. Many countries have started to legalise it. However, recent research work demonstrates that the consumption of cannabis is a significant risk factor for various types of psychosis. As such, research efforts are currently made to improve the estimation of cannabis contribution to the psychosis development. In this ongoing research we apply data science methodologies based on scalable machine learning and statistical learning to devise novel approaches to the prediction of the first-episode psychosis attributable to the use of high potency cannabis, and the quantification of risk factors, based on phenotype data. Genotype data is to be added to the analysis in a next phase of the research. The work is performed in collaboration with the teams of Dr Marta Di Forti, Prof Sir Robin Murray, and Prof Daniel Stahl at the Institute of Psychiatry, Psychology & Neuroscience, King's College London.
Predicting Psychosis from Experience Sampling Data using Machine Learning
Modern psychiatric classification systems categorize psychiatric disorders –partly evidence-based; largely pragmatically– based on different combinations of required number of symptom domains that exceed the operational threshold of severity. This taxonomy endorses unique phenotypes with precise boundaries. A prevailing trend in psychiatry has been to reify these categorical diagnoses. However, efforts to discriminate these psychiatric disorders, using modern genetic and neuroimaging data, have thus far failed to deliver a promising outcome. Evidence indicates commonality rather than distinction. The Experience Sampling Method (ESM), a personal diary method to assess mental states in real-time, provides a unique opportunity to observe these subtle fluctuations of mental states. It has various advantages over the conventional method of cross-sectional assessment of psychopathology based on self-report questionnaires: high ecological validity, high reliability, no recall bias, high temporal resolution, and contextual information. However, this intense assessment strategy produces a massive amount of information at an individual level. As such, even modern statistical approaches sometimes fail to provide optimal solutions to deal with the complexity of data at this scale. Machine learning offers enhanced solutions for this kind of research challenges when synergistically combined with more traditional statistical methods. The aim of this study is to predict pattern formation and agnostic clustering of general population using generic ESM data collected with mobile apps. The work is developed in collaboration with Andrea Katrinecz (Data Science MSc graduate), Dr Sinan Guloksuz of the Department of Psychiatry and Neuropsychology at Maastricht University Medical Centre and Department of Psychiatry, Yale University School of Medicine, and Prof Daniel Stahl of the Department of Biostatistics and Health Informatics, King's College London.

2. Identifying and Measuring Playful Parenting Using Machine Learning
Participants: Daniel Stamate, Caspar Addyman, Mark Tomlinson, Irene Uwerikowe, Jeremiah Ayock Ishaya, in collaboration with Stellenbosch University (South Africa), and Global Parenting Initiative led by Oxford

Positive mother-infant interactions are crucial for child development, and nonverbal synchrony—how their movements align—offers a measurable indicator. The research conducted in our lab develops approaches that use video data and machine learning to automatically estimate interaction quality. Models such as BiLSTMens BiGRUens trained on movement patterns successfully distinguished high and low synchrony, with strong performance in identifying dyads that may benefit from further support. This method highlights the potential of AI to support early detection and intervention in developmental contexts. Further related AI - prediction modelling work on parent – child interactions is conducted at Stellenbosch University.

3. Predicting Spectral Reflectance Curves and Applications in Coatings Industry
Participants: Daniel Stamate, Asei Akanuma, Alexandra Stepanenko, in collaboration with Sherwin-Williams

This research is developed in collaboration with Sherwin-Williams in Knowledge Transfer Partnership (KTP) and Accelerate Knowledge Transfer (AKT) projects co-funded by Innovate UK and by the business partner. The work concerns the development of innovative Artificial Neural Network / Deep Learning state of the art approaches to colour reflectance curve prediction for optimising the design of new coatings.

4. Machine Learning and NLP Sentiment Analysis in Finance
Participants: Daniel Stamate, Raph Olaniyan, and Frederic Marechal

There has been an increasing interest recently in examining the possible relationships between emotions expressed online and stock markets. Most of the previous studies claiming that emotions have predictive influence on the stock market do so by developing various machine learning predictive models, but do not validate their claims rigorously by analysing the statistical significance of their findings. In turn, the few works that attempt to statistically validate such claims suffer from important limitations of their approaches.

Growing research analyses the relationship between sentiment-filled online information and the stock market, and shows a tendency for the former to predict the latter. But little is known if this information's predictive power resolves uncertainty. Rather, it is believed that it induces volatility because investors over-react or under-react to new information as a result of sentimental contagion.

In particular, stock market data exhibit erratic volatility, and this time-varying volatility makes any possible relationship between these variables non-linear. Our work investigates and propose novel frameworks based on approaches that account for non-linearity and heteroscedasticity. We study also the asymmetric nature of influences of positive and negative sentiments on the stock market volatility.

Current research is extended also towards financial fraud detection with NLP and ML approaches.

5. Soft Computing, Evolutionary Algorithms and Applications
Participants: Doina Logofatu, Mihaela Breaban, Daniel Stamate, Ida Pu, in collaboration with Frankfurt University of Applied Sciences (Germany), and University of Iasi (Romania)

Soft Computing involves various advances in AI Algorithmics which are specific to the nature of this computing paradigm. This theme addresses the need for efficiency in solving optimisation problems or the need for offering tractable solutions for specific NP-hard problems by employing Evolutionary Computing approaches, in particular Genetic Algorithms and Particle Swarm Optimisation algorithms.

On the other hand, devising efficient algorithms for integrating, querying and performing inferences with imperfect information, benefits of Soft Computing approaches, as those based on multi-valued logics, and this is another direction we follow in our research. We develop algorithms for computing the semantics of the integrating, querying or inference rules that describes the result of these processes, and for deciding the query equivalence problem, which is useful in the query optimisation problem.

Moreover, statistical simulations are a useful Soft Computing tool that we employ for assessing new algorithms we propose for improving the time-efficiency in blocking expanding ring search for mobile ad hoc networks, or for various concurrency problems.