Why do many evaluation reports yield only weak insights? Having worked in all three corners of the evaluation triangle – as an evaluator, as an evaluation commissioner / manager, and as a stakeholder in interventions under evaluation (the evaluands) – I find that we can only put part of the blame on evaluation teams. Often, evaluations come with high expectations which low budgets and narrow timeframes cannot fulfil. If, on top of that, evaluations are poorly prepared, evaluation teams may find themselves struggling with scope creep and shifting goalposts. They will spend much of their time trying to understand the evaluand and negotiating the evaluation scope with the client, wasting time that should be spent on proper data collection and analysis. Better preparation and accompaniment of evaluations could make a big difference. Ideally, that should happen as part of an evaluability assessment and before the evaluation terms of reference (TOR) are finalised.

Howard White, a specialist in evaluation synthesis, has posted a list of 10 common flaws in evaluations. There are other flaws one could find. But I would propose to reflect on solutions that all parts of the evaluation triangle can contribute to. Evaluations work best when evaluators, evaluation managers and those who represent the evaluand work together as critical partners. In recent years, I have supported organisations in their evaluation management, so this post focuses on things that evaluation managers can do to prevent Howard’s ”10 flaws” (in italics below) from happening. Let’s look at them one by one!

1. Inadequate description of the intervention: Ideally, all evaluation reports start with the description of the evaluand. If the evaluand is one project implemented by one organisation in one country, it shouldn’t be too hard to fit that within a couple of pages. If it is a collection of programmes encompassing cascades of diverse activities by hundreds of organisations around the world, evaluators need to be a bit more abstract in their introductory description. But obviously they need to understand the evaluand to design the appropriate evaluation!
Evaluation managers can map the components of the programme, review its theory of change, and organise the documentation so that evaluation teams can make sense of it. This is particularly important if the evaluand is too complicated to be adequately described in the TOR. A good example from my practice was a portfolio evaluation: Before commissioning the evaluation, evaluation management developed a database listing key features of all projects in the portfolio. That made it easy to understand and describe the evaluand, and to select key cases for deeper review. Conversely, in a different assignment, my team spent (unplanned) months trying to make sense of the– sometimes contradictory – documentation and verbal descriptions of the sprawling evaluand.

2. ‘Evaluation reports’ which are monitoring not evaluation: Evaluation managers can prevent this problem by formulating appropriate evaluation questions. Often, evaluation questions start with “to what extent…, followed by rather specific questions about the achievement of certain results. Those kinds of questions risk limiting the evaluation to a process monitoring exercise, or some kind of activity audit. For programme learning, it is useful to ask questions starting with “why” and “how”.

3. Data collection is not a method: Evaluation managers can make sure the TOR requests evaluators to describe the approaches and methods they use in the evaluation, for data collection and for analysis respectively. They can look for gaps in the inception report, ideally checking the annexes as well, to find out whether the proposed instruments match the proposed methodology. That takes some specialist knowledge – ideally, evaluation managers should have substantial first-hand evaluation experience or a background in applied research.

4. Unsubstantiated evidence claims: Evaluation managers can invite evaluation teams to structure their reports clearly, so that each finding is presented with the supporting evidence. Many evaluations I have seen weave their findings and related evidence so closely together that it is hard to tell them apart – a style that is often described as “overly descriptive”. Obfuscating the boundaries between data and evidence can be a strategy to hide findings about gaps and failure in programmes. Where programme teams are hostile to challenging findings, evaluation managers can play a role in defending the evaluation team’s independence, and their mission to support learning from success and from failure.

5. Insufficient evidence: The amount of evidence an evaluation team can generate depends to a great extent on the time and other resources they have. One important role of evaluation managers is to ensure a good balance between expectations from the evaluation and resources for the evaluation. If an organisation expects an evaluation to answer, say, 30 complex questions on an evaluand encompassing tens of thousands of diverse interventions in diverse contexts within half a year, it must be prepared to live with evidence gaps.

6. Positive bias in process evaluations: Positive bias can arise from poor evaluation design (see also points 2, 3 and 4 above). It can also be linked to evidence gaps (see point 5 above) – when in doubt, evaluators hesitate to pass “negative judgements”. But often, positive bias slips in near the end of the evaluation process, when programme managers object against findings about gaps, mishaps, or failure in their programme. That takes back to the role of evaluation managers in fostering commitment to learning from failure.

7. Limited perspectives: Who do evaluators speak to? This problem is related to issues 1, 6 and 7 above. Where resources for an evaluation are limited, fieldwork might be absent or restricted to the most accessible areas (when I worked in China, they called such places “fields by the road”, always nicely groomed). When working on a shoestring, evaluators will struggle to sample, or to select cases, purposefully. But they can still speak to people representing different perspectives. Evaluation managers can encourage that, by mapping stakeholders in the TOR and explicitly asking for interviews with people who are underrepresented.

8. Ignoring the role of others: If most evaluation questions focus on programme performance, evaluators will focus on programme performance. Often, evaluation TOR address the role of others only in a brief question related to the coherence (OECD-DAC) criterion. But questions about effectiveness, impact and sustainability can also be framed to encourage evaluators to look at the influence of other “actors and factors”.
Also, ideally, programmes should be built on preliminary context and stakeholder analyses
which should be continuously updated. Where that has happened, that information should flow into the TOR’s context section.

9. Causal claims based on monitoring data: Good monitoring data can be a helpful ingredient in an evaluation that triangulates data from different sources. There is no reason to believe people fake their monitoring data. It is just that most of the time, the amount and quality of monitoring data are inadequate. Monitoring and evaluation specialists can make sure each programme has a monitoring system that produces data which are useful for monitoring and for evaluation. Furthermore, evaluation TOR should remind evaluators the need to triangulate data, i.e., to compare data sourced from different perspectives via different data collection tools.
Howard mentions a separate point under the “9th flaw”, the attribution problem: “Outcome evaluations present data on outcomes in the project area and claim that any observed changes are the result of the project.” But evaluators are not going to solve that problem by collecting data from a greater variety of perspectives. They need to be encouraged to look beyond the evaluand as a likely cause of the desired effects – see point 8 above.

10. Global claims based on single studies: As pointed out by Howard, lessons from a specific evaluation are only relevant for the intervention being evaluated. That is something that everyone in the evaluation triangle needs to be aware of. Evaluation managers are well placed to remind decision-makers in their organisations of the fact that an evaluation is about the evaluand only. It can feed into a broader body of evidence, but it should never be the only basis for decision-making beyond the context of the evaluand.

We have reached the end of the list, but there is so much more that can go wrong in evaluations. Investing in good preparation, and, once the evaluation team is recruited, building rapport and effective communication between evaluation managers, programme implementers and evaluators, are essential for risk management in evaluations.

Shopping Basket