Tell us if anything is missing!
Attrition, also known as dropout, occurs when participants fail to comply with the study requirements or leave a study after they have been assigned to an experimental group. It can lead to a biased estimate of the effect size because those that drop-out are likely to be different from those that stay in. For example, less motivated officers or teams might be more likely to drop out of a treatment. A technique used to counter this potential for bias is "intention to treat" analysis, where even those that drop out of the treatment are included in the final analysis.
A study is biased if its impact estimate varies from the real impact. This variation can be linked to weakness in the implementation or design of the evaluation.
For example, bias can be introduced if participants themselves decide whether to join the treatment or control groups. This ability to "self-select" could mean that a division with a particularly proactive police commander or lots of resources make their way into the treatment group, while divisions with less motivated commanders or less resources will end up in the control group. When this happens, differences in the outcomes of the two groups may be due to these pre-existing features (e.g. more resources or more proactive senior management) not the intervention, and the estimate of the effect size will suffer from bias.
There are many other potential sources of bias, including measurement bias, which is avoided by 'blinding' test delivery and analysis of data (see below), and attrition, which is discussed above.
Blinding is where information about the assignment of participants to their experimental group (e.g. control or treatment) is concealed from the evaluator, the participants, or other people involved in the study until it is complete.
Blinding can be introduced at various points in an evaluation:
Failure to blind can introduce bias. For example, a researcher analysing participant interview transcripts may interpret data differently if they know that the interviewee is receiving an intervention. If they really want the intervention to be successful, they may subconsciously exaggerate the interviewee's positive feedback on the intervention. Even if a researcher does their best to remain fair and objective, their own preconceptions of an intervention can still affect their analysis and introduce bias, without them realising.
All of the effect sizes produced by impact evaluations are estimates. Typically, confidence intervals provide the range of values that has a 95% probability of including the real effect size. The width of the confidence interval indicates the confidence we can place in a finding: the wider the interval, the less confidence we can have.
For example, a trial may estimate an effect size of 0.4 with confidence intervals between 0.35 and 0.46. This means there is a 95% probability that the real effect size lies between 0.35 and 0.46. If the confidence intervals were wider, 0.05 and 0.6 for example, we would place less confidence in this estimate.
Sometimes called a "comparison group", this group does not receive the intervention being evaluated and allows the evaluator to estimate what would have happened if the treatment group had not received the intervention. The control group should be as similar to the treatment group as possible before the intervention is applied. This can be achieved through random assignment or, if randomisation is not possible, matching. There are several types of control group:
CounterfactualThe outcome for the treatment group if it had not received the intervention is called the counterfactual. If a control group is constructed correctly, it can be used to estimate the counterfactual.
Effectiveness trials aim to test the intervention when implemented 'at scale' under realistic conditions in a large number of forces, boroughs or divisions. A quantitative impact evaluation is used to assess the impact on crime and a process evaluation is used to identify the challenges for delivery at scale. The cost of the intervention at scale is also calculated.
The effect size is an estimate of the size and direction of a change caused by an intervention. The value of an effect size is that it quantifies the effectiveness of a particular intervention, relative to a comparison group – essentially a measure of the extent of the differences between two groups. Effect size is a more reliable way of measuring impact because the emphasis is on the size of the effect - rather than its statistical significance (which can be affected by sample size),
Efficacy trials test whether an intervention can work when implemented under controlled conditions and on a small organisational scale. Experimental design
A research design where the treatment and control groups are identical before the intervention is applied. This is usually achieved through random assignment and allows the evaluator to assume that any change in outcomes is due to the intervention, not any pre-existing characteristics.
Describes the extent to which the results of an evaluation apply to another context. For example, a study which finds that an intervention is effective in an inner city division may have poor external validity in a rural division, because the areas may have different demographic profiles and crime rates.
Refers to whether an intervention is being implemented as intended by the developer. If there is low fidelity (officers or staff do not follow the programme closely) it is difficult to know whether an intervention is effective or not.
Sometimes called "observer effects", the Hawthorne effect is the phenomenon where participants change their behaviour due to the knowledge that they are being studied. For example, officers might follow a procedure or policy more closely if an evaluator is observing the patrol. The presence of Hawthorne effects can lead to biased estimation of the effect size. One way of avoiding the Hawthorne effect is to have an active control group.
An intervention's impact is the difference between the outcomes observed resulting from those who received the intervention and outcomes resulting from those who did not receive the intervention. For example, the change in offending behavior of individuals who received drug dependence treatment compared to individuals who did not receive the intervention. Impact evaluation is concerned with identifying the magnitude of this difference (AKA the effect size) and therefore requires quantitative research.
Intention to treat (ITT) analysis
ITT analysis can prevent non-compliance and attrition from biasing a study. Analysis is carried out on the groups as they were formed immediately after randomisation was completed. For example, if one of the participants in the intervention group does not comply with the programme of the intervention, they are included in the final analysis as if they had received the intervention. ITT avoids bias from creeping into the analysis and gives a credible estimate of how effective the intervention is in a real-world setting.
A study has internal validity if the estimate it produces is unbiased.
Any programme, policy or practice being evaluated.
A non-systematic review of the academic literature on a particular topic.
A method used to construct a comparison group, matching allows evaluators to control for characteristics such as educational attainment, age, or family income.
Matching is often used to create a control group when randomisation is impossible. Participants in the treatment group are matched to others who are not receiving the treatment according to characteristics thought to be relevant to the outcome measured by the evaluation. For example, a division receiving an intervention can be matched with a similar division who have not received the intervention based upon crime records, population demographic data, geographic information (e.g. size of division or proportion that is residential or commercial areas).
Matching allows the evaluator to assume that any differences in the post test are not due to pre-existing differences in the matched characteristics. For example, if you match police officers on their number years employed as an officer, it is safe to assume that length of service will not account for differences between the primary outcomes of the experimental groups. However, matching can only be done on observable characteristics. Some characteristics are unobservable (e.g. genetic predisposition, officers' relationships with colleagues) and are difficult to consider when matching.
Matching can also be used in an RCT to ensure that the groups are balanced. For example, participating forces can be paired on the basis of crime rate and then one from each pair randomly assigned to the treatment group and one to the control group.
A meta-analysis is the systematic analysis of several pre-existing studies of one intervention in order to produce a quantitative estimate of effect size. Meta-analyses also use the techniques of systematic review to decide which studies are included in the analysis. By combining several studies, the evaluator can gain a more accurate estimate of an intervention's impact.
A study where the assignment of participants to the treatment and control groups is not controlled by the evaluator.
The officers or staff, divisions or forces, offenders or victims, or members of the general public taking part in the trial.
Pilot studies are conducted to refine an intervention that is at an early or exploratory stage of development. Pilots usually run in a small number of settings (e.g. a reoffending intervention may be run in three prisons, or a policing intervention may be run in four operational divisions) and are used to establish an intervention's feasibility. Qualitative research is used to develop and refine the approach and test its feasibility, and initial indicative data is collected to assess its potential to reduce crime.
The power of a study refers to how likely it is to detect a statistically significant effect size. Before starting a study, evaluators estimate the effect size they expect to find. They use this figure to undertake power calculations and estimate the sample size required for an adequately powered study.
Pre-test or baseline measure
A measure that is carried out before the intervention is introduced. This is then compared to the same measure carried out after the intervention has been implemented, to assess the extent to which the intervention has affected the measure. For example, a survey of victims' satisfaction with how they were treated by police, that would then be repeated after a procedural justice training programme.
The primary outcome is the outcome that determines whether or not an intervention is considered effective. It should be decided before the trial starts and needs to be stated in the trial registration document. The primary outcome in research used in the Crime Reduction Toolkit is usually crime rate.
Process evaluation seeks to understand how an intervention was implemented and to understand the views of key stakeholders (e.g. police officers, criminal justice staff, social workers). It often involves both quantitative and qualitative research.
Qualitative research is concerned with description. It attempts to explore, describe or explain the social world using language.
Quantitative research attempts to establish quantities and magnitude. It attempts to explore, describe or explain the social world using numbers.
An impact evaluation design used when an experimental design is not feasible because the evaluators are not able to control assignment to experimental groups. Quasi-experimental designs use statistical techniques to create treatment and control groups that are as close as possible to identical in all respects before the application of the intervention to the treatment group.
Examples of quasi-experimental designs include matched designs and regression discontinuity designs.
Random assignment is an important feature of randomised controlled trials. It means that the allocation of a participant to the treatment or control groups is solely due to chance, and not a function of any of their characteristics (either observed or unobserved). If a large enough sample of participants is randomised, the two groups will be balanced on every characteristic.
Randomised Controlled Trial (RCT)
An RCT is a type of experimental design where participants are randomly allocated to the treatment and control groups. Random assignment allows the evaluator to assume that there are no prior differences between the two groups that could affect the primary outcome, and any effect size is therefore due to the intervention received by the treatment group.
Random assignment is used to deal with the problem of selection bias, which occurs when the way in which participants are assigned to experimental groups biases the findings of the study. For example, if an evaluator allows offenders to volunteer for the treatment group and fills the control group from the pool of offenders that did not volunteer, any difference in the primary outcome could be due to pre-existing characteristics and motivation of the offenders that volunteered. Offenders that volunteered for the treatment group may be more motivated to address their offending behavior or have more proactive probation officers, and these features could mean the treatment participants curb their offending at a faster rate than control participants who are less motivated or have a less proactive probation officer.
Regression is a statistical method which determines the nature and strength of the relationship between the evaluation's primary outcome (the "dependent variable") and one or more factors you think might affect your primary outcome (known as the independent, predictor or explanatory variables). For example, we could use regression analysis to estimate the average number of crimes that people commit on release from prison, based on their age.
There are many different types of regression analysis. Simple linear regression is the most basic and is used to determine the relationship between an outcome and one explanatory variable, as in the example above.
Multiple regression is a more complex method used to determine the relationship of many variables with your outcome variable. This method can also help you to understand how your explanatory factors might interact with each other. Using the example above, age might impact upon reoffending rate after release from prison, but multiple regression allows you to explore the effect of age on reoffending alongside factors such as family income, educational attainment, and severity of previous offences or length of sentence. This is often called modelling.
Regression Discontinuity Design (RDD)
The RDD is a type of a quasi-experimental research design. Participants are assigned to the treatment and control groups on the basis of whether they meet a certain threshold. Some may fall just below the threshold and others just over the threshold. It is assumed that the question of which side of the threshold they fall does not have a causal relationship with the primary outcome.
The RDD is best explained with an example. Consider an RDD that is used to evaluate the impact of a cognitive-behavioural therapy (CBT) treatment that only accepts offenders who are classed as 'high risk' on a reoffending risk assessment, for the purpose of this example the cut off for high risk is a score of 20 points. The treatment group is constructed from the offenders who score just above 20 points and are classed as 'high risk'. The control group is the offenders who score just under 20 points so are classed as 'medium risk'. The assumption is that offenders who were marginally either side of the threshold are similar and the question of whether offenders are just above or just below the threshold does not have a causal relationship with their later offending behaviour. You therefore have a treatment and a control group that are very similar in all respects and can be used to estimate the impact of the CBT treatment.
The CRT uses a framework called EMMIE to summarise the evidence underpinning crime reduction interventions, identified through systematic reviews and meta-analyses. This reporting structure provides a summary of the best available evidence, and the quality of that evidence, in relation to the five criteria below.
E - The reliability of crime reduction effects of an intervention
M - Mechanisms identified (how an intervention works)
M - Moderators identified (where or under what conditions, and for whom or what does the intervention work for)
I - Implementation (what was needed to put the intervention in place)
E – Economic (costs and returns of costs of the intervention)
The number of participants in the study.
A statistically significant finding means that the result has a relatively small probability of occurring due to chance alone (typically 0.1% to 5%) and is therefore likely to have an underlying cause.
Statistical significance is determined by conducting 'tests of significance' using quantitative data. Most social scientists consider that if there is 5% or less probability that the change in the outcome measure occurred by chance alone, it is a statistically significant change (i.e. something else besides chance has probably affected the outcome measure). As an example, consider changes in crime. Crime volume changes day to day and week to week, it is very rarely the same every week. These changes can usually be explained through 'natural variation.' Suppose one week, there is a notable drop in volume which is sustained for subsequent weeks (i.e. a 'step change'), this is not due to the natural variation or randomness of crime, it is likely to have an underlying cause (e.g. a prolific criminal gang has been sentenced or there was a successful police operation). If an evaluation has been designed to control for other influencing factors (i.e. a randomised control trial) it can be inferred that the statistically significant change in the outcome measure is most likely due to the intervention being evaluated.
However, there are a problems with reporting 'significance' alone. The main one is that significance depends on two things: the size of the effect and the size of the sample. Results might be 'significant' either as the results of a large effect (despite having only a small sample) or as the result of a large sample (even if the actual effect size were tiny). It is important to know the statistical significance of a result, without it there is a risk of drawing firm conclusions from small samples. However, statistical significance does not tell you the most important thing: the size of the effect. One way to overcome this confusion is to report the effect size, together with an estimate of its likely confidence interval.
A synthesis of the research evidence on a particular topic, which uses strict criteria to exclude studies that do not fit certain methodological requirements. Systematic reviews that provide a quantitative estimate of an effect size are called meta-analyses.
The group of police officers or staff, offenders or victims, or members of the general public that receive the intervention.