Highlights

Effect analysis: Effect analysis focuses on identifying the results of the intervention, determining whether the intervention makes a difference, in what way, for whom, and whether the observed effects can be attributed to the intervention.
Research designs: Experiments were traditionally associated with effect analysis. Many different designs exist to answer effect analysis questions, each with its own strengths and threats to validity.
Choosing the right design for an effect analysis: The purpose of the evaluation, the characteristics of the intervention and its implementation context are likely the main factors to consider when choosing an approach for effect analysis.
Contribution analysis: Contribution analysis is a theory-based approach for effect analysis better suited for evaluating complex interventions.
Impact analysis: Impact analysis is an umbrella term that encompasses a wide range of research designs, including experimental and quasi-experimental designs and contribution analysis.
Impacts on planetary health: While some impact or effects analysis designs can readily integrate planetary health considerations, experimental or quasi-experimental designs may require a mixed-methods approach.
Outcome harvesting: Outcome harvesting is an approach designed to identify effects in changing contexts. This approach is useful to understand social change processes.

Introduction

Interventions are created and implemented for specific reasons and to achieve specific objectives. Confirming whether implemented interventions are effective is important for decision-makers. Evaluators have a key responsibility in determining whether the intervention’s objectives were achieved. However, this question is often not easy to answer. Evaluators must ask themselves: which research strategy can be implemented to respond to this question with confidence?

The production of results is often a complex process that has implications beyond achieving the intervention objectives. It’s particularly complex because interventions often produce effects other than those they were originally designed to achieve; some of these additional effects are beneficial, others not; some will affect people beyond the target population; and some effects may be unexpected.

Analyzing the results of an intervention can address different questions such as:

• What are the results of the intervention?

• Did the intervention make a difference? If yes, in what way, for whom?

• Are the effects observed attributable to the intervention?

In the evaluation literature, people distinguish the effects from the impacts (longer-term). In this chapter, effects will represent the chain of results, that is a continuum from outputs to outcomes to impacts (Mayne, 2015).

Some evaluators specialize in effect analysis which is niche that requires specialized expertise (Shadish et al., 2002). Historically, effect analysis has been closely associated with experiments (experimental and quasi-experimental research designs). In the last 25 years, the field around the evaluation of effects has expanded. Contribution analysis was created specifically to analyze the effects of complex interventions. There is a wide range of designs to investigate effects, including for interventions that are less suited to experimental methods. Effect analysis is an evaluation question that now covers a rich and diverse set of research designs.

This chapter provides an overview of the foundational principles of effect analysis and a description of the diversity of approaches used to evaluate effects. The structure of the chapter was designed to help the reader understand the logic of inquiry behind effect analysis. It starts by defining effects and causal relationships. It then reviews experimental and quasi-experimental research. This review will help the reader understand the strengths and limitations of such approaches. As we will see, some interventions are not good candidates for these research designs. Contribution analysis, specifically designed for complex interventions, will be presented. These sections should help the reader understand how some designs can be a good fit or not, according to the evaluation contexts. Impact analysis will then be presented. Impact analysis is an umbrella term that includes a large range of approaches aiming at evaluating effects. Impact analysis, as presented by Stern and colleagues (Stern, 2015; Stern et al., 2012), includes experimental and quasi-experimental research and contribution analysis. Presenting impact analysis after having discussed more traditional designs should help the reader understand how these different approaches differ and what they bring to evaluating the effects of interventions. In this chapter, the integration of planetary health dimensions into effect and impact analysis is also explored. Finally, this chapter will also touch on outcome harvesting. Outcome harvesting is not an approach focusing on assessing causal relations, but rather an approach for evaluating interventions’ effects and social change in unpredictable and changing contexts. Given that we will likely experience more unstable and fast-changing contexts, outcome harvesting may become more prominent in the upcoming years.

Defining Effects and Causal Relationship

Donald Campbell (1916-1996) was a social scientist with expertise in psychology. He made landmark contributions to the study of causality in evaluation. He and his colleagues, William Shadish and Thomas Cook, published renowned works on causality and experimentation which are still foundational references in the field (Shadish et al., 2002). In their 2002 book, they stated:

We can better understand what an effect is through a counterfactual model that goes back at least to the 18^th-century philosopher David Hume (Lewis, 1973, p. 556). A counterfactual is something that is contrary to fact. In an experiment, we observe what did happen when people received a treatment. The counterfactual is knowledge of what would have happened to those same people if they simultaneously had not received the treatment. An effect is the difference between what did happen and what would have happened. (as cited in Shadish et al., 2002, p. 5)

A counterfactual is something that cannot be observed directly (it is logically impossible to be in the treatment and non-treatment groups simultaneously), hence the need to create a reasonable approximation of counterfactuals (Shadish et al., 2002). For analyzing effects, evaluations are meant to create conditions of experimentation that allow for the identification of effects and ensures that the effects observed are attributable to the intervention.

How do we know if cause and effects are related? In a classic analysis formalized by the 19th-century philosopher John Stuart Mill, causal relationship exists if (1) the cause preceded the effect, (2) the cause was related to the effect, (3) we can find no plausible alternative explanation for the effect other than the cause. These three characteristics mirror what happens in experiments in which (1) we manipulate the presumed cause and observe an outcome afterwards; (2) we see whether variation in the cause is related to variation in the effect; (3) we use various methods during the experiment to reduce the plausibility of other explanations for the effect, along with ancillary methods to explore the plausibility of those we cannot rule out. […] Experiments explore the effects of things that can be manipulated, such as a dose of a medicine, the amount of a welfare check, the kind or amount of psychotherapy, or the number of children in a classroom. (Shadish et al., 2002, pp. 6-7)

However, nonmanipulable events or attributes require the creation of something other than experimental research design to assess the causal relationship which is harder to study. A diversity of research designs exists as alternatives to experimental approaches.

It is important to note that experimental research covers a wide range of designs and diverse contexts. Various terms are used to represent these distinct evaluative contexts:

Efficacy refers to the measurement of effects under controlled conditions (as opposed to real-life conditions). In these controlled settings, all groups receive the same predetermined doses of the intervention.
Effectiveness, on the other hand, measures the intervention's impact under normal usage conditions, accounting for potential challenges such as differences in access or dosage due to typical usage (Champagne et al., 2011a).

Experimental research is often presented as the gold standard for studying attribution of effects. However, this type of research also has limits. In particular, the conditions of experimentation may reduce the ability to generalize results (Shadish et al., 2002).

For example, efficacy which is the measure of effects between the treatment group and the control group is generally different (and superior) as compared to the measure of effectiveness which is the difference of effects between the group of users and the non-users, measured in normal use conditions (Champagne et al., 2011a). This difference is primarily explained by issues of accessibility and variations in treatment adherence. Similarly, some have criticized experimental research for its frequent focus on white male participants, which can reduce the consideration of the characteristics of other population groups and hides the potential effects on groups not invited to the experiment.

“The strength of experimentation is its ability to illuminate causal inference” (Shadish et al., 2002, p. 18). In particular, experimental designs eliminate internal validity threats such as rival hypothesis (McDavid & Huse, 2019). “The weakness of experimentation is doubt about the extent to which that causal relationship generalizes” (Shadish et al., 2002, p. 18).

Experimental and Quasi-Experimental Research Designs

Experimentation generally refers to the manipulation and control of the research conditions to allow for the testing of some hypothesis. However, different types exist and refer to specific research processes. Shadish et al. define an experiment as a “study in which an intervention is deliberately introduced to observe its effects” (Shadish et al., 2002, p. 12).

Experimental designs require that units or participants be randomly allocated to the experimental group (the one that receives the intervention), also called a ‘treatment group’, or to the control group (McDavid & Huse, 2019).

Quasi-experimental research, on the other hand, does not use randomization for assigning the intervention to participants. Assignment to conditions can be done by the participant or any other person. The research still has control of the conditions of the study (i.e., the conditions of the experiment, dosage, etc.) and efforts are made to make the experimental and control groups as similar as possible using methods such as matching participants in the groups.

In most experimental or quasi-experimental designs, key outcome variables are measured before the program is implemented, and then after the program has been running long enough to be considered fully implemented. The ‘before’ measure is called a pre-test, and the ‘after’ measure is a post-test. Comparisons of the pre-test to the post-test averages will show whether there was an average change in the outcome variable for the treatment group. (McDavid & Huse, 2019, p. 43).

Natural experiment research is “not really an experiment because the cause usually cannot be manipulated; [an example is] a study that contrasts a naturally occurring event such as an earthquake with a comparison condition” (Shadish et al., 2002, p. 12).

Various experimental designs exist: Pre-test/Post-test, Post-test only, and post-test but using a longitudinal perspective with measures of outcomes at different moments in time (Champagne et al., 2011a; McDavid et al., 2019; Weiss, 1998). A diversity of quasi experimental designs and non-experimental research designs also exist—such as static group comparison, before-after design, times series, and case study—each presenting distinct potential threats to internal validity and external validity. Summary tables of experimental and quasi-experimental designs and their relative strengths and potential biases are presented in McDavid et al. (2019, p. 133) and Champagne et al. (2011a, p. 185).

Experimental and quasi-experimental research use the following symbols and acronyms for describing designs:

• R: randomization

• X: treatment/intervention

• O: observation

For example, a Post-test research design would be represented as follows:

X O1

A Comparative Pre-test/Post-test design with randomization would be represented as:

O1 X O2

The quality of the study determines the confidence in the results and in the causal relationship between a specific intervention and the effects. The study quality depends on different elements and potential threats (Champagne et al., 2011a):

(1) Validity of instruments and measurements

(2) Internal validity

(3) External validity

(4) Validity of statistical analysis

The following section provides a brief overview the first three elements.

Validity of Instruments and Measurements

The validity of an instrument is its actual ability to correctly measure what it is supposed to measure (Champagne et al., 2011a, p. 168). This includes:

• Content validity: Refers to the extent to which an instrument accurately represents relevant dimensions of the concept it is intended to measure. For example, mortality rate alone would be a poor measure of overall population health.

• Criterion validity: Refers to the extent to which an instrument correlates with and accurately predicts the outcome it is intended to measure. An example includes examining whether success in undergraduate studies correlates with admissions exam scores, especially when such exams are used as a tool to predict academic success.

• Construct validity: Refers to whether the test is truly measuring the intended construct and not something else. For example, IQ tests are not necessarily a good measure of intelligence.

Content validity assesses how well the test covers all the relevant aspects of the concept it is supposed to measure, while construct validity evaluates whether the test is measuring the intended construct and not something else.

Internal Validity

Internal validity corresponds to the confidence that the variations observed in the outcomes can be attributed to the intervention (Champagne et al., 2011a). Several elements can threaten internal validity (Champagne et al., 2011a, p. 182; McDavid et al., 2019; Ohlund & Yu, N/A):

• History: Something influences the outcome during the study other than the treatment.

• Selection: Participants differ in some relevant ways between the control and the experimental groups.

• Maturation: Changes such as fatigue or experience acquired during the experiment.

• Regression to the mean/Statistical regression: Extreme scores on the pretest tend to be less extreme on the post-test.

• Attrition/experimental mortality: Participants drop out—this is a threat if attrition differs between the control and experimental groups.

• Testing: Participants become familiar with the testing conditions when the same tests are applied multiple times.

• Instrumentation: Changes in measuring instruments, observers, etc., between the pretest and post-test can affect outcomes.

• Selection-based interactions: Selection can interact with other validity threats. For example, a program aiming to increase validity that is tested on two groups with different socio-economic status could introduce bias.

Table 8.1 presents internal validity threats to some quasi-experimental and experimental research designs. For each of the following research designs, "yes" indicates possible threats to internal validity.

Contribution Analysis

EPUB

Show the following:

Adjust appearance:

Notes