Table 8-1 Internal Validity Threats to Some Quasi Experimental and Experimental Research Designs
Model | History | Selection | Maturation | Statistical regression | Attrition/Mortality | Testing | Instrumentation | Selection-based interactions | |
Quasi-Experimental Research Designs | |||||||||
Pre-post test design | O X O | yes | yes | yes | yes | yes | yes | ||
Static group comparison design | X O O | yes | yes | yes | yes | ||||
Pre-post comparison group design | O X O O O | yes | yes | ||||||
Case study design | X O | yes | yes | yes | yes | yes | |||
Single time series design | OOO X OOO | yes | yes | yes | |||||
Comparative time series design | OOO X OOO OOO OOO | yes | yes | ||||||
Experimental Research Designs | |||||||||
Pre-post test design with control group and randomization | R O X O O X O | yes | |||||||
Post test design with control group and randomization | R X O O | yes |
Source: Compiled from McDavid et al. (2019) and Champagne et al. (2011a).
External Validity
External validity relates to the capacity to generalize the results to different contexts beyond the context of research. It refers to the extent to which a causal relationship “holds over variations in persons, settings, treatments and outcomes” (Shadish et al., 2002, p. 86). They can be summarized into five categories (see Exhibit 8.1).
Exhibit 8.1: Threats to external validity: Reasons why inferences about how study results would hold over variations in persons, settings, treatments, and outcomes may be incorrect
Interaction of the Causal Relationship with Units: An effect found with certain kinds of units might not hold if other kinds of units had been studied.
Interaction of the Causal Relationship over Treatment Variations: An effect found with one treatment variation might not hold with other variations of treatment, or when that treatment is combined with other treatments, or when only part of that treatment is used.
Interactions of the Causal Relationships with Outcomes: An effect found on one kind of outcome observation may not hold if other outcome observations were used.
Interaction of the Causal Relationship with Settings: An effect found in one kind of setting may not hold if other kinds of settings were to be used.
Context-Dependent Mediation: An explanatory mediator of a causal relationship in one context may not mediate in another context.
Source: Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Company: 87.
In experimental research, several biases can potentially prevent the extrapolation of results. Champagne et al. (2011a) name the following:
- Contagion/diffusion of treatment: Communication or interaction between the control and experimental groups that compromises their independence and potentially influences outcomes.
See Champagne et al. (2011a, p. 183) for a more extensive list.
Each experimental research design presents strengths and threats to validity. Fully randomized designs are intended to prioritize internal validity and, hence, meet the conditions to determine whether there is a causal relationship between the treatment and the outcome variable. To choose the most promising design for a specific context, further consultation of specialized resources is recommended, including works by, such as Weiss (1998), Shadish et al. (2002), McDavid et al. (2019), (Brousselle et al., 2011b).
Experimental designs are essential in some contexts. For example, before commercializing new pharmaceuticals, it is important to make sure they are effective and safe and that they have been appropriately tested. New drugs are tested through rigorous protocols, first in laboratory conditions, then on humans using randomization among participants who share similar characteristics to isolate the production of effects and to gain confidence that the observed effects are related to the intervention. At this stage, the limit of generalizability of results has not been resolved. Even if such settings provide high confidence in the causality relations, they still limit the capacity to generalize results.
Similarly, randomized controlled trials (RCT) are limited in participant diversity. For example, the first vaccines for COVID-19 were tested on children only after having been tested on adults, which delayed vaccine accessibility to this population. In the past, some RCTs were subject of controversy. For example, during early HIV/AIDS drug testing, some participants reportedly shared doses with the hope that it would give them a chance to extend their lives. Finally, experimental research is not suitable in all contexts. It is not always possible to create experimentation conditions that isolate the causal relations between the intervention and the effects. Consequently, evaluators developed and used alternative designs for effect analysis, such as Contribution Analysis and those listed in the section on Impact Evaluation.
Contribution Analysis
Contribution analysis was initially developed by John Mayne when he was working in the Office of Auditor General of Canada. He was working on strategies to increase the relevance of evaluations in the Federal Government of Canada.
Government programs are intended to produce certain outcomes: more jobs, a healthier public, better living conditions, etc. Effective programs are those that make a difference in meeting these kinds of objectives – they contribute to the intended outcomes that citizens value. In trying to measure the performance of a program, we face two problems. We can often—although frequently not without some difficulty—measure whether or not these outcomes are actually occurring. The more difficult question is usually determining just what contribution the specific program in question made to the outcome. How much of the success (or failure) can we attribute to the program? What has been the contribution made by the program? (Mayne, 1999, pp. 2-3)
Contribution analysis has often been associated with the analysis of complex programs.
Contribution analysis is based on the existence of, or more usually, the development of a postulated theory of change for the intervention being examined. The analysis examines and tests this theory against logic and the evidence available from results observed and the various assumptions behind the theory of change, and examines other influencing factors. The analysis either confirms – verifies – the postulated theory of change or suggests revisions in the theory where the reality appears otherwise. The overall aim is to reduce uncertainty about the contribution an intervention is making to observed results through an increased understanding of why results did or did not occur and the roles played by the intervention and other influencing factors.(Mayne, 2012a, p. 271)
Contribution analysis examines, through causal claims, the contribution, rather than the attribution, a complex program is making to outcomes and impacts (Delahais & Toulemonde, 2012; Mayne, 2001, 2008, 2011, 2012a, 2012b, 2015). When an intervention is sufficiently complex that it is difficult to isolate direct causal relationships, contribution analysis can help construct a plausible story about what happened. It explains the observation of outcomes and impacts of the intervention and identifies key threats to the chain of effects, other contributing factors and rival explanations. The overall objective is for the evaluator to build confidence that the observed effects and impacts are associated with the intervention, while considering other potential influences and alternative explanations.
Contribution analysis involves six steps:
1. Setting out the cause–effect issue to be addressed;
2. Developing the postulated theory of change and risks to it, including rival explanations;
3. Gathering evidence on the theory of change;
4. Assembling and assessing the contribution claim and challenges to it;
5. Seeking out additional evidence; and
6. Revising and strengthening the contribution story. (Mayne, 2012a, p. 272)
Mayne indicates that to be confident the intervention led to the observed effects, the evaluation will need to check the following elements:
1) plausibility of the theory of change;
2) implementation as outlined in the theory of change;
3) evidentiary confirmation of key elements;
4) identification and examination of other influencing factors; and
5) the extent to which key alternative explanations have been disproved. (Mayne, 2011, p. 7)
Impact Evaluation
Impact evaluation, as presented by Stern and colleagues, is an umbrella term used to present a diversity of designs focusing on identifying effects and impacts of interventions. According to these authors, impact evaluation aims at measuring effects but also answering the “how” and “why” questions, shifting the focus of the evaluation from causality to explanation (Stern, 2015; Stern et al., 2012).
Stern opposes the narrowly defined focus of traditional effect analysis to the consideration of effects in impact evaluation.
Experimental methods are concerned with intended rather than unintended effects; assume direct links between interventions and outcomes; address primary rather than secondary effects; and usually look to evidence in the short-term rather than the long-term. This latter is especially important as in many development settings effects are not known when programme funding ends, only becoming clear over a much more extended timescale. Most counterfactual methods on the other hand focus on the short-term, which is likely to capture only a sub-set of programme results. (Stern, 2015, p. 5)
Impact evaluation aims to answer the four following questions:
1) To what extent can a specific (net) impact be attributed to the intervention?
2) Did the intervention make a difference?
3) How has the intervention made a difference?
4) Will the intervention work elsewhere? (Stern et al., 2012, p. 37)
A diversity of designs can be considered for conducting impact evaluations. Stern identifies four main approaches to impact evaluations (IE):
Regularity frameworks that depend on the frequency of association between cause and effect – the inference basis for statistical approaches to IE.
Counterfactual frameworks that depend on the difference between two otherwise identical cases – the inference basis for experimental and quasi-experimental approaches to IE.
Multiple causation that depends on combinations of causes that lead to an effect – the inference basis for ‘configurational’ approaches to IE including qualitative comparative analysis (QCA) and contribution analysis.
Generative causation that depends on identifying the ‘mechanisms’ that explain effects – the inference basis for ‘theory based’ and ‘realist’ approaches to IE. (Stern, 2015, p. 17)
Table 8.2 represents different available options. As you can see, the conceptualization of impact evaluation, as proposed by Stern (2015), encompasses experimental research, quasi-experimental research, natural experiments, and contribution analysis.
Table 8.3 elaborates on the choice of the evaluation design for impact evaluations based on the key evaluation questions and the contextual research conditions.
Table 8-2 Design Approaches, Variants and Causal Inference
Design approaches | Specific variants | Basis for causal inference |
Experimental | RCTs Quasi experiments Natural experiments | Counterfactuals: The difference between two otherwise identical cases – the manipulated and the controlled; the co-presence of cause and effects. |
Statistical | Statistical modelling Longitudinal studies Econometrics | Regularity: Correlation between cause and effect or between variables, influence of (usually) isalatable multiple cuases on a single effect. Control of ‘confounders’. |
Theory-based | Causal process designs: Theory of change, process tracing, contribution analysis, impact pathways, Causal mechanism designs: Realist evaluations, congruence analysis | Generative causation: Identification and confirmation of causal processes or ‘chains’. Supporting factors and mechanisms at work in context. |
Case-based | Interpretative: Naturalistic, grounded theory, ethnography Structured: Configurations, QCA, within-case-analysis, simulations and network analysis | Multiple causation: Comparison across and within cases of combinations of causal factors. Analytic generalization based on theory. |
Participatory | Normative designs: Participatory or democratic evaluation, empowerment evaluation. Agency designs: learning by doing, policy dialogue, collaborative action research. | Actor Agency: Validation by participants that their actions and experienced effects are ‘caused’ by programme. Adoption, customization and commitment to a goal |
Synthesis studies | Meta-analysis, narrative synthesis, realist-based synthesis | Accumulation and aggregation within a number of perspectives (statistical, theory-based, ethnographic). |
Source: Stern, E. (2015). Impact Evaluation. A Guide for Commisioners and Managers, p.18: https://www.bond.org.uk/wp-content/uploads/2022/08/impact_evaluation_guide_0515.pdf
Table 8-3 Summarizing the Design Implications of Different Impact Evaluation Questions
Key evaluation questions | Related evaluation questions | Underlying assumption | Requirements | Suitable designs |
To what extent can a specific (net) impact be attributed to the intervention? | What is the net effect of the intervention? How much of the impact can be attributed to the intervention? What would have happened without the intervention? | Expected outcomes and the intervention itself clearly understood and specifiable Likelihood of primary cause and primary effect Interest in particular intervention rather than generalisation | Can manipulate interventions Sufficient numbers (beneficiaries, households, etc.) for statistical analysis | Experiments Statistical studies Hybrids with case-based and participatory designs |
Has the intervention made a difference? | What causes are necessary or sufficient for the effect? Was the intervention needed to produce the effect? Would these impacts have happened anyhow? | There are several relevant causes that need to be disentangled Interventions are just one part of a causal package | Comparable cases where a common set of causes are present and evidence exists as to their potency | Experiments Theory-based evaluation, eg contribution analysis Case-based designs, eg QCA |
How has the intervention made a difference? | How and why have the impacts come about? What causal factors have resulted in the observed impacts? Has the intervention resulted in any unintended impacts? For whom has the intervention made a difference? | Interventions interact with other causal factors It is possible to clearly represent the causal process through which the intervention made a difference – may require ‘theory development’ | Understanding how supporting and contextual factors that connect intervention with effects Theory that allows for the identification of supporting factors (proximate, contextual and historical) | Theory-based evaluation especially ‘realist’ variants and Contribution Analysis Participatory approaches |
Can this be expected to work elsewhere? | Can this ‘pilot’ be transferred elsewhere and scaled up? Is the intervention sustainable? What generalizable lessons have we learned about impact? | What has worked in one place can work somewhere else Stakeholders will cooperate in joint donor/beneficiary evaluations | Generic understanding of contexts eg typologies of context Clusters of causal packages Innovation diffusion mechanisms | Participatory approaches and some Experimental and Theory-based approaches Natural experiments Realist evaluation Synthesis studies |
Source: Stern, E. (2015). Impact Evaluation. A Guide for Commisioners and Managers, p.22: https://www.bond.org.uk/wp-content/uploads/2022/08/impact_evaluation_guide_0515.pdf
Impact evaluations and contribution analysis have expanded the methodological landscape for analyzing the effects and impacts of interventions. This diversity of approaches enables analysis in contexts where experiments are not always feasible. Each approach will have its advantages, strengths, and potential validity threats. This overview touches briefly on a significant domain of evaluation; further reading is strongly recommended for those intending to lead evaluations on this critical topic.
Considering Planetary Health Dimensions when Evaluating Impacts
Planetary health dimensions have generally not been considered in effect analysis, contribution analysis, or impact analysis unless the intervention specifically targets one of these dimensions—either directly or because it is a focus of the evaluation project. As demonstrated, effect analysis using experimental research designs generally focuses on effects that are specifically and narrowly defined. This focus arises from the need to minimize biases and ensure that the research design and measurement procedures allow for the assessment of the causal relationship under study. The objective of such designs is to isolate the experiment from its context to focus on the causal relationship being tested. It is not natural to integrate new variables into such research settings, particularly if these variables are diffuse and occur over the longer-term.
However, even in effect studies relying on experiments, dimensions of planetary health can and should be included. The impact on the environment and human systems is real for interventions that are evaluated through experiments. For example, pharmaceuticals, which are tested in controlled experimental conditions, are typically released into the water system when they are consumed, thus contaminating rivers and deeply affecting life under water (Brodin et al., 2024). In this example, identifying the potential impacts of these interventions early allows for raises awareness about potential risks, supports the implementation of mitigation measures alongside drug approval (Benton et al., 2025).
Considering potential impacts on the critical dimensions of planetary health may be easier and more natural in some research designs for effects analysis and impact evaluations. Several research designs listed in Table 8.2, including contribution analysis, case-based analysis, participatory approaches, synthesis studies, and even statistical analysis and modelling, readily allow for considering impacts on human and natural systems.
Contribution analysis, in particular, could directly benefit from existing theory-based approaches for designing interventions contributing to planetary health (Brousselle et al., 2022), as the underlying principles are the same. For most research designs listed in Table 8.2, except for experimental research, integrating planetary health dimensions would primarily involve explicitly identifying these dimensions and exploring the intervention’s impacts on them. A systematic assessment of the intervention’s effects on health, equity, prosperity, pollution, land and water, and biodiversity could be readily integrated.
For other research designs, such as experimental research, considering impacts on human and natural systems would require integrating a complementary research step. The analysis would likely be two-fold: one stage relying on experiments, and the other dedicated to analyzing impacts on planetary health dimensions. For this second, complementary step, existing tools such as the Planetary Health Rapid Impact Assessment Tool (Brousselle et al., 2024b) combined with expert consultation and a review of published scientific literature, could serve as a basis for a holistic effect research design. To overcome the limitations of high-quality experimental designs, this strategy would support the adoption of a mixed methods research design rather than relying solely on experimental research (Creswell & Plano Clark, 2011).
Outcome Harvesting
Outcome harvesting is an approach designed to identify effects in changing contexts. This approach is useful to understand social change processes (Wilson-Grau, 2019). It doesn’t try to assess causality. It can become more relevant as we experience quickly evolving and unpredictable contexts due to environmental change, post-truth, and disinformation.
Outcome Harvesting is not intended for projects or project components that are well-supported by evidence, or that closely follow a theory of change. Outcome Harvesting is appropriate for identifying and verifying outcomes, including unintended outcomes. It is a useful, commonsense approach that easily engages informants and is designed to generate concrete evidence to inform decisions about future actions. (Wilson-Grau et al., 2024).
It can be used when an organization is experiencing constant change and “unexpected and unforeseeable actors and factors in their programming environment” (Wilson-Grau, 2019, p. 1).
For example, these are observable changes in societal actors targeted by policy advocacy interventions: a government minister publicly declares she will restrict untendered contracts to under 5% (an action); a civil society organization launches a campaign for governmental transparency (an activity); two political parties join forces to collaborate rather than compete in proposing transparency legislation (relationship); a senior government official for the first time acknowledges the need for off-grid, sustainable energy production in rural areas (agenda); a legislature passes a new anti-corruption law (policy); or a government implements norms and procedures for publishing all procurement records (practice). Thus, the definition of “outcome” in Outcome Harvesting contrasts with the definition of an outcome as observed changes in the intended program beneficiaries’ wellbeing (learning, health, employment). Those changes would be the impact of these policy changes. (Wilson-Grau, 2019, p. 2).
Outcome harvesting focuses on the identification of outcomes defined as “a change in the behavior, relationships, actions, activities, policies, or practices of an individual, group, community, organization, or institution” (Wilson-Grau, 2019, p. 2). This approach is not meant to identify impacts.
The bottom line of social change is societal actors changing the way they do things. It is only when individuals, groups, communities, organizations, and institutions change their actions, activities, relationships, policies, and practices that a society changes, for good or bad. The essence of an outcome in Outcome Harvesting is those societal actors demonstrably changing their behaviors in those ways. (Wilson-Grau, 2019, p. 170).
Outcome Harvesting collects (“harvests”) evidence of what has changed (“outcomes”) and, working backwards, determines whether and how an intervention has contributed to these changes. (Wilson-Grau et al., 2024).
This approach includes 6 steps (see Exhibit 8.2).