Predictive analytics solution using Machine Learning in the Cloud to analyse repeat victimisation in domestic abuse victims in London

Research Institution / Organisation

Northumbria University

In Collaboration With

Metropolitan Police Service

Principal Researcher

Sheriff Oyelakin Oyedeji

Level of Research


Project Start Date

April 2021

Research Context

Identifying repeat victimisation in domestic abuse victims is usually part of reactive criminal investigations, via reports made to safeguarding partners, and crime surveys.

This research will suggest that identification of repeat victimisation could be a proactive tool such as a predictive individual risk assessment program. Predictive analytics models using Machine Learning in the Cloud will be deployed to analyse repeat victimisation in domestic abuse victims.

This research will examine the potential use of historical crime data to analyse patterns and identify the possibility of repeat victimisation that will assist police and safeguarding partners to take preventive steps to mitigate against future abuse.

Research Scope and Objectives

The scope of this research is limited to data relating to domestic abuse (DA) victims in London between March 2019 and March 2021.

The objectives of the research are to:

  1. Examine the barriers towards adoption of Big Data analytics in policing

  2. Evaluate possible methods for analysing repeat victimisation in DA victims

  3. Implement a predictive analytics solution using Machine Learning in the Cloud

  4. Assess the benefits and limitations of Big Data analytics in predictive policing.

Background to the research area
The Crime Survey for England and Wales showed that "an estimated 2.3 million adults aged 16 to 74 years experienced domestic abuse" for the 12-month period to year ending March 2020 (ONS, 2020).

There were 357 victims of domestic homicides in England and Wales between the year ending March 2017 and year ending March 2019. This represents 28% of all homicide victims during this period (ONS, 2020). Domestic homicides usually start off as incidents of low-level crime before escalating, and most of the victims/perpetrators of these homicides would have passed through the criminal justice system before.

According to a report published by the domestic abuse charity SafeLives in 2015, "85% of victims sought help five times on average from professionals in the year before they got effective help to stop the abuse" (SafeLives, 2015).

Research Contribution
Advances in Artificial Intelligence (AI) and workforce automation are some of the transformative technologies that would have a major impact on policing in the next 20 years (College of Policing, 2020). This research will contribute to this body of work through the assessment of the benefits and limitations of Machine Learning and Big data analytics in predictive policing.

This research will also benefit the public by partaking in discussion relating to preventing the commission of offences and contributing to key areas of research interest proposed by the Metropolitan Police Service’s for 2019/20.

Research Methodology

Data Collection
The research plans to utilise secondary anonymised data from the Metropolitan Police Service (MPS). Of particular interest is the dataset documenting specific domestic abuse victim’s characteristics between March 2019 and March 2021.

Data variables

  • Age (aggregated into age groups)
  • Sex
  • Crime type
  • Employment status (employed or unemployed)
  • Disability (not specific but categorised as physical, mental health, or unknown)
  • Same Household as Suspect (yes or no)
  • Repeat victim of domestic abuse in the last 12 months (yes or no).

Solution Design
The research will evaluate the relationships between 7 specific characteristics of domestic abuse victims such as Age, Sex, Employment status, Disability, Crime type, Same Household as Suspect, and key y variable, whether they have been a repeat victim of domestic abuse in the last 12 months.

Predictive Machine Learning solution using linear regression will be employed to establish the relationships among these predictor variables by estimating how much impact the other variables have on the y variable. Azure Machine Learning will be employed to train analytics models using large proportion of the dataset. A review of experiments’ metrics will be conducted, and best models will be deployed accordingly.

Data Analysis
Predictive models will be scored using the remaining untrained dataset. Analysis of the test results and deployment will be conducted using descriptive and inferential statistical metrics from Azure and Power Bi data visualisation tool.

Date due for completion

October 2021
Return to Research Map