Data
Science
& Soft
Computing
Lab
Full
Members
Dr Daniel Stamate, Lab lead, Data Scientist, Goldsmiths, and University of Manchester
Prof Fionn Murtagh, Data Scientist, Goldsmiths
Dr Ida Pu, Computer Scientist, Goldsmiths
Alexandra Stepanenko, Accelerated Knowledge Transfer - AKT Research Associate, Goldsmiths
Mihai Ermaliuc, PT PhD Candidate Data Science at Goldsmiths, working in Neural Networks, Generative Adversarial Networks, and Large Language Models
Henry Musto, PT PhD Candidate in Data Science, working in predicting Dementia with Survival and Classification Statistical and Machine Learning models on ADNI and ELSA cohorts
Mohamed Saber, PT PhD Candidate Data Science at Goldsmiths, working in Financial Fraud Detection
Jiri Marek, PT PhD Candidate Data Science at Goldsmiths, working in Behavioural Finance
Associated
Members
Prof
Daniel Stahl, Professor
Medical Statistics and Statistical Learning, Lead of Precision
Medicine and Statistical Learning Group, Institute of Psychiatry,
Psychology and Neuroscience, King's College London
Prof
Doina Logofatu,
Computer Scientist and
Mathematician, Frankfurt University of Applied Sciences
Prof
Mihaela Breaban,
Computer Scientist, University of Iasi
Dr Olesya Ajnakina, Senior Data Scientist and Statistician, King's College London
Mr Frederic Marechal, Data Scientist in industry
Dr Charlotte Wu, Strategic Physician Leader in health systems & technology innovation, Harness Health Partners
Data
Science MSc Students interns
Eoin
Houstoun
Former
Members
Asei
Akanuma (KTP Research Associate), John
Langham (Research Associate), Alexandra Stepanenko (AKT Research
Associate), Wajdi
Alghamdi (PG Researcher), Raph Olaniyan (PG Researcher), Karolina
Rutcowska (Research intern), Andrea Katrinecz (Research intern),
Jeremy Ogg (Research intern), Pedro Lopez (Research intern), Gabriel
Burcea (Research intern), Rostislav Vorobev (Visinting researcher),
Ruslan Tsygankov (Visiting researcher), Rubaida Easmin (Research
intern), Mazy Carneiro (Research intern), Gozde Orhan (Research
intern), Esperanza
Ballesteros (Research intern), Markela Zeneli (Research intern), Prad
Sree Davuloori (Research intern), Riya Haran (Research intern)
Main
Research Directions
1. Machine Learning Prediction Modelling in Mental Health
2. Identifying and Measuring Playful Parenting Using Machine Learning
3. Predicting Spectral Reflectance Curves and Applications in Coatings Industry
4. Machine Learning and NLP Sentiment Analysis in Finance
5. Soft
Computing, Evolutionary Algorithms and Applications
1.
Machine Learning Prediction Modelling in Mental Health
(1.a)
Predicting Risk of Dementia with Machine Learning using Routine
Primary Care Records – CPRD.
Participants:
Daniel Stamate, Fionn Murtagh, Mihai Ermaliuc, John Langham,
Charlotte Wu, in collaboration with Prof David Reeves and team at the
Centre for Primary Care in the Institute for Population Health,
University of Manchester
Our Lab leads on the Machine Learning aspects of the study based on our project on Predicting the risk of dementia using routine primary care records, which is developed in collaboration with University of Manchester and other academic partners. The project got media coverage at BBC. The research work concerns the development of novel synergistic approaches to predicting dementia based on Machine Learning (AI) and Statistical methods, and the development of a prediction tool. There are currently almost 1 million people in UK living with dementia. There is currently no cure, and the condition has higher health and social care costs than cancer, stroke and chronic heart disease, taken together (dementia cost in UK being £26 billion per year). Current thinking suggests that 35% of cases of dementia could be prevented. Our research project aims to contribute to prevention, and to helping improve diagnosis rates (currently at least one third of expected patients don't receive a dementia diagnosis) through predicting risk of dementia with new machine learning and statistical based approaches. The main source of data to be analysed in this project is the Clinical Practice Research Datalink (CPRD).
(1.b)
Predicting Alzheimer's and Dementia with Machine Learning and
Statistical Approaches on ADNI, EMIF-AD and ELSA
cohorts.
Participants:
Daniel Stamate, Daniel Stahl, David Reeves, Henry Musto, Rostislav
Vorobev, Ruslan Tsygankov, Olesya Ajnakina, in collaboration with
Institute of Psychiatry London - King's College London, UCL, Oxford
University, EMIF-AD Consortium partners, and University of
Manchester
This
topic involves predicting Alzheimer's Disease (AD) and Dementia with
innovative Machine Learning and Statistical Learning methodologies
on:
i) Alzheimer's Disease Neuroimaging Initiative: ADNI with methodologies based on Neural Networks and Deep Learning, Gradient Boosting, Gaussian Processes, SVM, and Survival Machine Learning
ii) European Medical Information Framework - Alzheimer's Disease: EMIF-AD (AD Biomarker Discovery), with methodologies based on Gradient Boosting Machines, Random Forests and Deep Learning
iii) English Longitudinal Study of Ageing – ELSA, with methodologies based on Survival Random Forests, Survival Elastic Net and Cox models, and Gradient Boosting classification.
(1.c)
AI for Predicting Psychosis
Participants:
Daniel Stamate, Daniel Stahl, Wajdi Alghamdi, Andrea Katrinecz, in
collaboration with Institute of Psychiatry, Psychology &
Neuroscience, King's College London, Department of Psychiatry and
Neuropsychology Maastricht University Medical Centre, and
Department of Psychiatry, Yale University School of
Medicine
Prediction Modelling and
Pattern Detection Approaches for the First-Episode Psychosis
Associated to Cannabis Use
Recent
studies show that cannabis is one of the most popular drugs in the
world. Many countries have started to legalise it. However, recent
research work demonstrates that the consumption of cannabis is a
significant risk factor for various types of psychosis. As such,
research efforts are currently made to improve the estimation of
cannabis contribution to the psychosis development. In this ongoing
research we apply data science methodologies based on scalable
machine learning and statistical learning to devise novel approaches
to the prediction of the first-episode psychosis attributable to the
use of high potency cannabis, and the quantification of risk
factors, based on phenotype data. Genotype data is to be added to
the analysis in a next phase of the research. The work is performed
in collaboration with the teams of Dr Marta Di Forti, Prof Sir Robin
Murray, and Prof Daniel Stahl at the Institute of Psychiatry,
Psychology & Neuroscience, King's College London.
Predicting Psychosis from
Experience Sampling Data using Machine Learning
Modern
psychiatric classification systems categorize psychiatric disorders
–partly evidence-based; largely pragmatically– based on
different combinations of required number of symptom domains that
exceed the operational threshold of severity. This taxonomy endorses
unique phenotypes with precise boundaries. A prevailing trend in
psychiatry has been to reify these categorical diagnoses. However,
efforts to discriminate these psychiatric disorders, using modern
genetic and neuroimaging data, have thus far failed to deliver a
promising outcome. Evidence indicates commonality rather than
distinction. The Experience Sampling Method (ESM), a personal diary
method to assess mental states in real-time, provides a unique
opportunity to observe these subtle fluctuations of mental states.
It has various advantages over the conventional method of
cross-sectional assessment of psychopathology based on self-report
questionnaires: high ecological validity, high reliability, no
recall bias, high temporal resolution, and contextual information.
However, this intense assessment strategy produces a massive amount
of information at an individual level. As such, even modern
statistical approaches sometimes fail to provide optimal solutions
to deal with the complexity of data at this scale. Machine learning
offers enhanced solutions for this kind of research challenges when
synergistically combined with more traditional statistical methods.
The aim of this study is to predict pattern formation and agnostic
clustering of general population using generic ESM data collected
with mobile apps. The work is developed in collaboration with Andrea
Katrinecz (Data Science MSc graduate), Dr Sinan Guloksuz of the
Department of Psychiatry and Neuropsychology at Maastricht
University Medical Centre and Department of Psychiatry, Yale
University School of Medicine, and Prof Daniel Stahl of the
Department of Biostatistics and Health Informatics, King's College
London.
2.
Identifying and Measuring Playful Parenting Using Machine
Learning
Participants: Daniel Stamate, Caspar
Addyman, Mark Tomlinson, Irene Uwerikowe, Jeremiah Ayock Ishaya, in
collaboration with Stellenbosch University (South Africa), and Global
Parenting Initiative led by Oxford
Positive mother-infant interactions are crucial for child development, and nonverbal synchrony—how their movements align—offers a measurable indicator. The research conducted in our lab develops approaches that use video data and machine learning to automatically estimate interaction quality. Models such as BiLSTMens BiGRUens trained on movement patterns successfully distinguished high and low synchrony, with strong performance in identifying dyads that may benefit from further support. This method highlights the potential of AI to support early detection and intervention in developmental contexts. Further related AI - prediction modelling work on parent – child interactions is conducted at Stellenbosch University.
3.
Predicting Spectral Reflectance Curves and Applications in Coatings
Industry
Participants:
Daniel Stamate, Asei Akanuma, Alexandra Stepanenko, in collaboration
with Sherwin-Williams
This research is developed in collaboration with Sherwin-Williams in Knowledge Transfer Partnership (KTP) and Accelerate Knowledge Transfer (AKT) projects co-funded by Innovate UK and by the business partner. The work concerns the development of innovative Artificial Neural Network / Deep Learning state of the art approaches to colour reflectance curve prediction for optimising the design of new coatings.
4.
Machine Learning and NLP Sentiment Analysis in Finance
Participants:
Daniel Stamate, Raph Olaniyan, and Frederic Marechal
There has been an increasing interest recently in examining the possible relationships between emotions expressed online and stock markets. Most of the previous studies claiming that emotions have predictive influence on the stock market do so by developing various machine learning predictive models, but do not validate their claims rigorously by analysing the statistical significance of their findings. In turn, the few works that attempt to statistically validate such claims suffer from important limitations of their approaches.
Growing research analyses the relationship between sentiment-filled online information and the stock market, and shows a tendency for the former to predict the latter. But little is known if this information's predictive power resolves uncertainty. Rather, it is believed that it induces volatility because investors over-react or under-react to new information as a result of sentimental contagion.
In particular, stock market data exhibit erratic volatility, and this time-varying volatility makes any possible relationship between these variables non-linear. Our work investigates and propose novel frameworks based on approaches that account for non-linearity and heteroscedasticity. We study also the asymmetric nature of influences of positive and negative sentiments on the stock market volatility.
Current research is extended also towards financial fraud detection with NLP and ML approaches.
5.
Soft Computing, Evolutionary Algorithms and
Applications
Participants:
Doina Logofatu, Mihaela Breaban, Daniel Stamate, Ida Pu, in
collaboration with Frankfurt University of Applied Sciences
(Germany), and University of Iasi (Romania)
Soft
Computing involves various advances in AI Algorithmics which are
specific to the nature of this computing paradigm. This theme
addresses the need for efficiency in solving optimisation problems or
the need for offering tractable solutions for specific NP-hard
problems by employing Evolutionary Computing approaches, in
particular Genetic Algorithms and Particle Swarm Optimisation
algorithms.
On the other hand, devising efficient algorithms for integrating, querying and performing inferences with imperfect information, benefits of Soft Computing approaches, as those based on multi-valued logics, and this is another direction we follow in our research. We develop algorithms for computing the semantics of the integrating, querying or inference rules that describes the result of these processes, and for deciding the query equivalence problem, which is useful in the query optimisation problem.
Moreover, statistical simulations are a useful Soft Computing tool that we employ for assessing new algorithms we propose for improving the time-efficiency in blocking expanding ring search for mobile ad hoc networks, or for various concurrency problems.