Why ask why in theory-based evaluation

A theory of change is a wonderful instrument to explore the „why“ and „how“ of an intervention. So why do evaluations make such patchy use of theories of change? Often, it is because evaluation questions ask mainly“how much“. This blog narrates how I have come to this conclusion.

Risk management for evaluation managers

Why do many evaluation reports yield only weak insights? Having worked in all three corners of the evaluation triangle – as an evaluator, as an evaluation commissioner / manager, and as a stakeholder in interventions under evaluation (the evaluands) – I find that we can only put part of the blame on evaluation teams. Often, evaluations come with high expectations which low budgets and narrow timeframes cannot fulfil. If, on top of that, evaluations are poorly prepared, evaluation teams may find themselves struggling with scope creep and shifting goalposts. They will spend much of their time trying to understand the evaluand and negotiating the evaluation scope with the client, wasting time that should be spent on proper data collection and analysis. Better preparation and accompaniment of evaluations could make a big difference. Ideally, that should happen as part of an evaluability assessment and before the evaluation terms of reference (TOR) are finalised.

Howard White, a specialist in evaluation synthesis, has posted a list of 10 common flaws in evaluations. There are other flaws one could find. But I would propose to reflect on solutions that all parts of the evaluation triangle can contribute to. Evaluations work best when evaluators, evaluation managers and those who represent the evaluand work together as critical partners. In recent years, I have supported organisations in their evaluation management, so this post focuses on things that evaluation managers can do to prevent Howard’s ”10 flaws” (in italics below) from happening. Let’s look at them one by one!

1. Inadequate description of the intervention: Ideally, all evaluation reports start with the description of the evaluand. If the evaluand is one project implemented by one organisation in one country, it shouldn’t be too hard to fit that within a couple of pages. If it is a collection of programmes encompassing cascades of diverse activities by hundreds of organisations around the world, evaluators need to be a bit more abstract in their introductory description. But obviously they need to understand the evaluand to design the appropriate evaluation!
Evaluation managers can map the components of the programme, review its theory of change, and organise the documentation so that evaluation teams can make sense of it. This is particularly important if the evaluand is too complicated to be adequately described in the TOR. A good example from my practice was a portfolio evaluation: Before commissioning the evaluation, evaluation management developed a database listing key features of all projects in the portfolio. That made it easy to understand and describe the evaluand, and to select key cases for deeper review. Conversely, in a different assignment, my team spent (unplanned) months trying to make sense of the– sometimes contradictory – documentation and verbal descriptions of the sprawling evaluand.

2. ‘Evaluation reports’ which are monitoring not evaluation: Evaluation managers can prevent this problem by formulating appropriate evaluation questions. Often, evaluation questions start with “to what extent…, followed by rather specific questions about the achievement of certain results. Those kinds of questions risk limiting the evaluation to a process monitoring exercise, or some kind of activity audit. For programme learning, it is useful to ask questions starting with “why” and “how”.

3. Data collection is not a method: Evaluation managers can make sure the TOR requests evaluators to describe the approaches and methods they use in the evaluation, for data collection and for analysis respectively. They can look for gaps in the inception report, ideally checking the annexes as well, to find out whether the proposed instruments match the proposed methodology. That takes some specialist knowledge – ideally, evaluation managers should have substantial first-hand evaluation experience or a background in applied research.

4. Unsubstantiated evidence claims: Evaluation managers can invite evaluation teams to structure their reports clearly, so that each finding is presented with the supporting evidence. Many evaluations I have seen weave their findings and related evidence so closely together that it is hard to tell them apart – a style that is often described as “overly descriptive”. Obfuscating the boundaries between data and evidence can be a strategy to hide findings about gaps and failure in programmes. Where programme teams are hostile to challenging findings, evaluation managers can play a role in defending the evaluation team’s independence, and their mission to support learning from success and from failure.

5. Insufficient evidence: The amount of evidence an evaluation team can generate depends to a great extent on the time and other resources they have. One important role of evaluation managers is to ensure a good balance between expectations from the evaluation and resources for the evaluation. If an organisation expects an evaluation to answer, say, 30 complex questions on an evaluand encompassing tens of thousands of diverse interventions in diverse contexts within half a year, it must be prepared to live with evidence gaps.

6. Positive bias in process evaluations: Positive bias can arise from poor evaluation design (see also points 2, 3 and 4 above). It can also be linked to evidence gaps (see point 5 above) – when in doubt, evaluators hesitate to pass “negative judgements”. But often, positive bias slips in near the end of the evaluation process, when programme managers object against findings about gaps, mishaps, or failure in their programme. That takes back to the role of evaluation managers in fostering commitment to learning from failure.

7. Limited perspectives: Who do evaluators speak to? This problem is related to issues 1, 6 and 7 above. Where resources for an evaluation are limited, fieldwork might be absent or restricted to the most accessible areas (when I worked in China, they called such places “fields by the road”, always nicely groomed). When working on a shoestring, evaluators will struggle to sample, or to select cases, purposefully. But they can still speak to people representing different perspectives. Evaluation managers can encourage that, by mapping stakeholders in the TOR and explicitly asking for interviews with people who are underrepresented.

8. Ignoring the role of others: If most evaluation questions focus on programme performance, evaluators will focus on programme performance. Often, evaluation TOR address the role of others only in a brief question related to the coherence (OECD-DAC) criterion. But questions about effectiveness, impact and sustainability can also be framed to encourage evaluators to look at the influence of other “actors and factors”.
Also, ideally, programmes should be built on preliminary context and stakeholder analyses
which should be continuously updated. Where that has happened, that information should flow into the TOR’s context section.

9. Causal claims based on monitoring data: Good monitoring data can be a helpful ingredient in an evaluation that triangulates data from different sources. There is no reason to believe people fake their monitoring data. It is just that most of the time, the amount and quality of monitoring data are inadequate. Monitoring and evaluation specialists can make sure each programme has a monitoring system that produces data which are useful for monitoring and for evaluation. Furthermore, evaluation TOR should remind evaluators the need to triangulate data, i.e., to compare data sourced from different perspectives via different data collection tools.
Howard mentions a separate point under the “9th flaw”, the attribution problem: “Outcome evaluations present data on outcomes in the project area and claim that any observed changes are the result of the project.” But evaluators are not going to solve that problem by collecting data from a greater variety of perspectives. They need to be encouraged to look beyond the evaluand as a likely cause of the desired effects – see point 8 above.

10. Global claims based on single studies: As pointed out by Howard, lessons from a specific evaluation are only relevant for the intervention being evaluated. That is something that everyone in the evaluation triangle needs to be aware of. Evaluation managers are well placed to remind decision-makers in their organisations of the fact that an evaluation is about the evaluand only. It can feed into a broader body of evidence, but it should never be the only basis for decision-making beyond the context of the evaluand.

We have reached the end of the list, but there is so much more that can go wrong in evaluations. Investing in good preparation, and, once the evaluation team is recruited, building rapport and effective communication between evaluation managers, programme implementers and evaluators, are essential for risk management in evaluations.

FGDs mean groups with focus & discussion!

This year again, I feel privileged to serve on a panel of senior evaluators who advise a multilateral donor on evaluation approaches and methods. And this year again, I feel saddened by the widespread neglect of qualitative data collection. All evaluations I have reviewed (cumulatively, I have reviewed hundreds…) include at least some elements of qualitative data collection – key informant interviews (KIIs), for example, or focus group discussions (FGDs). Even in (quasi-) experimental setups that rely on large standardised surveys, qualitative data are used to build questionnaires that resonate with the respondents, or to deepen insights on survey findings.

We need good data for good evaluations. Too often, the KII and FGD guides I see appended to evaluation reports are not likely to elicit good data: They are worded in abstract language (some evaluators don’t even seem to bother translating highly technical evaluation questions into questions that their interlocutors can relate to), and they contain way too many questions. I have seen an interview guide listing more than 50 questions for 60-90-minute interviews. That won’t work. A FGD guide with 20 questions for a 2-hour discussion with 12 persons won’t work, either. You can gather answers to 20 questions within two hours, but they will come from just one or two participants and there won’t be any meaningful discussion. Discussion is the whole point of a FGD – you want to hear different voices!

In my practice, I like to work with smaller focus groups – about 3-8 persons – and I count about 1-3 questions per hour, plus time for a careful introduction. The questions should be phrased in a way that makes it easy to discuss them – avoid jargon, because jargon spawns jargon, often hard to interpret. The Better Evaluation library provides a helpful video that explains key principles of FGDs, even though I would be careful about mixing women and men in the some settings. In international cooperation, it has become common practice to organise separate focus groups with female and male participants respectively, to avoid male voices dominate, and to surface issues that people don’t like to discuss in front of representatives of other genders. You also need to consider other aspects of participants‘ identity – social class, for example – to obtain reasonably homogenous focus groups. And you could try to find a way of collecting data from people who don’t identify as female or male, especially when you wish to work in a fully gender-responsive (or feminist) manner. (Have a look at this week’s posts on the American Evaluation Association tip-a-day newsletter celebrating pride week!)

Back in 2019, I published a blog post on what I called classism in data collection – a widespread trend in international evaluations to hold KIIs with powerful people only, and to lump those who are supposed to ultimately draw some benefit from the evaluated project into large FGDs. I’ll repost the blog soon because I see this issue over and over again, and it is not only an inequitable practice, it also yields shoddy data. Watch this space!

AI again: Silva’s experience

Silva Ferretti, a colleague in international evaluation, has written an inspiring post on AI in evaluation that she has kindly allowed me to reproduce here. Sit back and enjoy the read!

>> I have been playing with Artificial Intelligence for some time now. I am amazed by it and actually surprised by the lack of debate regarding its role in development and humanitarian program management. Have I missed any discussions on this topic? If anyone has any information or pointers, I would greatly appreciate it. It is a game changer. We seriously should look into this NOW.

I learnt that:

It can write well-crafted logical frameworks and program concepts, as well as sectoral strategies, that are on par or even better than some real ones. It is able to anticipate risks and limitations, and propose detailed activities.
It is inclusive and politically aware, in a positive way. It has been trained to value inclusion and diversity, and is skilled at articulating ideas of participation and accountability, while also understanding that these ideas can generate conflict.
It is progressive and embraces a variety of methods and approaches. It can easily determine when rigorous/objective research is needed and when more constructivist methods should be used. It understands the advantages and areas of application for complexity-aware and feminist approaches.
It is creative and can use various communication styles. It suggested that conventional monitoring and evaluation methods may not be suitable for some programs and helped me generate anecdotes, commercials and even a rap song.
It excels at concepts, not facts. It does not provide references or links, and may sometimes confuse the names of standards or approaches. However, it understands the core concepts and can provide valuable insights. It is not a search engine, but a different paradigm.

What do I take from it?
1) the AI looks so good because a lot of developmental and humanitarian work is based on set approaches and jargon. We play by the book, when writing projects, when monitoring and evaluating change. This has advantages of course (we should not always reinvent the wheel!). But this is also where an AI works best. It is like these professionals good at making any project look cool, using the right words: nice, streamlined, even when reality is messy. And, sadly, what surfaces about many projects and programmes are just these sanitized proposals/reportings: confirmation of preset causal chains, with pre-set indicators… whilst local partners and change makers would tell more interesting and varied stories. It is the sanitized stories which eventually travels up the reporting chain, and into the AI of the future. This generates confirmation bias. And strengthens models accepted and established because we keep using them with the same lenses and logic. But reality is not like the blueprint.
2) the AI is more progressive than several professionals/institutions, in recognizing the whole field of complexity and complexity-driven approaches. Have a chat with it, asking what approaches are best in diverse contexts. It is adamant that participatory and empowerment processes require ad-hoc approaches. The lesson? That available evidence already indicates that there is not only one appropriate way to manage and evaluate (the bureaucratic/rigourous one). The fact that a machine understands the importance of the non quantifiable, of emergence, of feminist approaches – and some human managers don’t get it… – well, it makes me think a lot.
3) The AI can be really „creative“ when prompted. Try it out, and discover the many ways we could use to share the same concepts: poems, songs, riddles, conversations, anecdotes, stories. It is liberating, and a great way to free our own creativity and reach out to new audiences – when talking about change. It can add a whole new „communication dimension“ to monitoring, evaluation, and programming.
4) It is already happening. Artificial intelligence is not going to materialize in the far away future. You can do pretty decent work with it now. For routine tasks, including proposal writing, it is at least as good as a middle level officer needing supervision. How are we going to react? How should we use this tool? What will we teach to the next generation of professionals?

I am not afraid that AI can substitute humans. I am amazed, mesmerized by it. I find it stimulating. It provides, on a whim, the approach „by the book“ and free time to then go ahead, explore its details, the alternatives, the possibilities.
I fear that it can substitute humans as they are now used and considered by the development / humanitarian systems: bureaucrats applying tried and tested approaches, and having to confirm them using a standardized set of tools (which really do not allow to „go out of the box“). If this is the understanding, whole layers of professionals can be made redundant. Soon you can run your perfect programme, monitored by other AIs.
Which, to some extent, will (and probably also should!) happen.

What many people are already advocating re: monitoring, evaluation, learning is becoming more urgent. Do we have the tools to address the complexity ahead?
What is in that black box in between your question and AI response? How is the knowledge feeding it being built?
Our role will be, more and more, to check how ideas are formed, to explore mindsets behind the blueprints being formed, to check the boundaries of the issues we confront, to explore the links that are not yet there. Issues of interrelations, worldviews, boundaries, inclusion: it is the deep level use of systemic thinking that we should embrace, if we want to be enhanced, not trapped by the AI.

Am I optimistic? Not much, because the trend in the sector is for control, compliance, procedures. Even this list sometimes worries me when I see over and over the same debates about things that AI can already do better. But it consoled me that even the AI feels that humans still matter. In its words:

Humans have several unique abilities when it comes to understanding complexity that AI currently cannot replicate, here are some examples:

Empathy: Humans have the ability to understand and relate to the emotions, thoughts, and experiences of others. This allows them to understand the perspectives of different groups and to make decisions that take into account the needs and feelings of all stakeholders.
Intuition: Humans have the ability to make decisions based on their intuition or gut feelings, which can be particularly useful in complex and rapidly changing situations where there may be limited data or time to analyze information.
Creativity: Humans have the ability to think creatively and come up with innovative solutions to complex problems. They can think outside the box and come up with new and unexpected ideas.
Complex decision making: Humans can take into account multiple variables, such as cultural and political context, community dynamics, and the specific needs of different groups, and make decisions accordingly. They can also consider the ethical implications of their decisions.
Contextual understanding: Humans can draw on their experiences and knowledge to understand the specific context in which a humanitarian or development project is being implemented, and how it relates to the larger social, economic, and political environment.

It’s important to note that AI can assist in providing information and ideas, but the human ability to empathize, use intuition, think creatively, make complex decisions and contextual understanding are unique.In summary, my main function is to provide information and ideas, but the real work is done by the humans, who have the ability to understand the complexity of the situation, make decisions, and adapt to the context. Humans also have the ability to empathize and provide compassionate support, which is essential

All things that are recognized by complexity-driven, adaptive, participatory management practices. But are the development and humanitarian sector really run on this understanding?

Again… there are many urgent debates to have. And I would love to have pointers, resources, insights about them. Thanks in anticipation.

[If you wonder: I asked the AI if it agreed with the ideas shared in this mail „As a language model, I am not capable of having opinions or feelings. However, I can confirm that the statement you’ve provided is consistent with the capabilities that I have been trained on“. And, as I was there, I asked the AI to do some edits for readability – the initial bullet points. So, this mail is not co-written… but it is co-edited, discussed, validated. 🙂 ]

AI and evaluation

This year, there are lots of interesting discussions on the use of artificial intelligence in evaluation. I have started trying out ChatGPT to help me find out about mainstream information/opinion on the subjects I have worked on, develop workshop plans, and draft an email message. The results are interesting: at first sight, ChatGPT responses look quite good, but closer scrutiny makes them seem quite mediocre. That is normal, because ChatGPT relies on what is out there – all of it, or an average view of all of it, i.e., not the most advanced or the most exciting thoughts. Still, one can build on a medium useful response to develop something more suitable for the task at hand. An important first step is to question what the machine has produced and to reflect on what is missing in the machine generated answer. It could be the entire construct the answer is based on…

Regardless of more serious applications, ChatGPT has tricks to cheer up our work routines. This is what it has produced when I asked for a summary of principles of evaluation in the style of Emily Dickinson:

To assess, to know, to understand,
The ways of evaluation, so grand.
First, set clear goals for what you seek,
And measure progress, week by week.

Gather data, both near and far,
To see what’s working and what’s not par.
Look for patterns, trends, and clues,
And use this knowledge to inform your views.

Consider all perspectives, wide and narrow,
To understand the impact, deep and thorough.
Be open to feedback, both good and bad,
And use it to improve, like iron clad.

Celebrate successes, big and small,
And share them widely, one and all.
Acknowledge challenges, too, with grace,
And work together, to find a new pace.

In all your evaluations, always strive
To keep an open mind, and be alive.
For learning and growth, and progress to come,
Are the gifts of evaluation, to some.

Infographic summarising key findings on international advocacy

International advocacy

Last year I followed an exhilarating course on qualitative data visualisation with the PME Campus (in German). Among many other things, we learned how to design infographics using online platforms that make it really easy. For the example below, I first created a word cloud based on the short description of a funding scheme for international advocacy. Then, I uploaded the word cloud as a background for my infographic. A bit messy, but fun!

Infographic summarising key findings on international advocacy

International advocacy – an infographic

Last year I participated in a very enjoyable and informative course on qualitative visualisation with the PME Campus (in German). Among other things, we learned to design infographics. There are plenty of web-based platforms with a wealth of templates and elements for infographics. For the example below, I generated a word cloud based on the description of a funding scheme that supports international advocacy. Then I uploaded the cloud as the background to my infographic. A bit messy but fun!

Michaela’s first infographic

Feminist foreign policy and evaluation

DEval, the German Institute for the Evaluation of Development Cooperation, celebrated its 10th anniversary last night. It was a real-life event in a beautiful Berlin location bringing together an impressive crowd, including among others Svenja Schulze, our Federal Minister for Economic Cooperation and Development. One of the topics of her keynote speech was the current federal government’s commitment to feminist development policy. What does that mean for evaluation? Responding to a question by Minister Schulze, Jörg Faust, Director of DEval, came up with four aspects:

  • A Do No Harm/research ethics, e.g., by anonymising data about interviewees
  • Context-sensitive research
  • Evaluation design that ensures a wide spread of people are ‚appropriately heard‘
  • More diverse evaluation teams

While these elements definitely make good ingredients for a feminist approach to evaluation, I wonder what is feminist about it. Shouldn’t any evaluation tick all these boxes?

As the Federal Ministry for Economic Cooperation and Development (BMZ) puts it, „feminist development policy is centred around all people and tackles the root causes of injustice such as power relations between genders, social norms and role models.“ Let’s set aside this concept of ‚centering around all people‘ – I guess it only means that feminist policy is not for women only. Let’s look at the other half of the sentence. Wouldn’t that mean that evaluations should look into power relations and other (potential) root causes of gendered injustice, or at least examine whether and how projects have attempted to address those root causes? And what does it take for non-male people at the margins of society to be ‚appropriately heard‘? Won’t evaluators need to spend more time listening to more non-male people, in their own languages (btw. Translators without Borders appears to be doing a wonderful job on this)? Shouldn’t we have individual conversations not only with those that hold positions of power in a project, but also with intended ‚ultimate beneficiaries‘ of various backgrounds?

This is an aside, but an aside that is close to my heart. Often, I find it somewhat disrespectful and methodologically dodgy when evaluators organise group discussions for ‚grassroots‘ women to share how a project has changed (or not) aspects of their lives, while more privileged project stakeholders and external specialists are interviewed individually. Wouldn’t a feminist approach have to put this upside down, by inviting powerful people to reflect on project & context issues in focus groups, and organising individual interviews to learn about ‚grassroot women’s‘ personal experience in the project?

And, as evaluators, could we make a bigger effort to speak with women’s and lesbian, gay, bi, trans, intersex and queer (LGBTIQ) rights groups wherever we go, and generally identify more diverse experts for our key informant interviews? How about involving local/national/regional women’s and broader human rights experts and activists in the development of our data collection tools, in data analysis, and in crafting locally viable recommendations with a potential to transform power relations?

Sounds like this is asking too much? True, many evaluations I have come across (and I have seen many, in many roles) display only modest efforts to integrate gender and equity concerns, even though equity is part of the updated OECD-DAC effectiveness criterion for evaluation. Often, all you learn from such evaluations are the old messages that women and girls are worse off than the rest, and that social norms are to blame for that. Not very satisfying.

But there are evaluations out there, carried out by teams with a keen sense for rights-based work and power analysis, which have made the effort to reveal and test assumptions on gender roles underlying the programm logic. They have shown how a programme logic or theory of change that builds on a mistaken understanding of gender roles contributes to unwanted effects. That is the kind of finding that makes it into the executive summary of an evaluation report, and that is likely to open people’s eyes to the harm a conventional, gender blind approach to development can cause. Let’s not allow ‚feminist evaluation‘ to become a mere buzzword, or an excuse for wishy-washy methodologies. Let’s turn it into something meaningful that will yield new, potentially transformative, insights.

A real life workshop with a virtual facilitator

A few weeks ago I ended up as the virtual facilitator in a workshop that everybody else attended in ‚real life‘, at a pleasant venue in the countryside – and it worked out nicely! Here is how we went about it. Spoiler: Sunshine and plenty of greenery have played an important part.

The planned workshop was supposed to happen at a lovely place in the countryside, on a sunny late summer day. I was looking forward to enjoying being there, and working with a group of people who had hired me as an external facilitator. Then, two days before the workshop, COVID-19 arrived at my household. I was fine, testing negative, but my client felt it was safer I’d stay away from workshop venue. We had to regroup and reorganise, both on the human and the technical front:

On the human side, I needed a pair of eyes and ears in the room. We appointed a participant who would be my connection to „the room“ (that is how facilitators sometimes call the group they work with). That turned out to be essential, not only because ‚the room‘ was outdoors and all over the place. We agreed that the co-facilitator would devote most of her attention to her co-faciliating role, which involved not only eyes and ears, but also hands-on management of the participants‘ verbal contributions.

At the physical venue, there was something they called a ‚tower‘ – basically, a webcam and a multidirectional microphone on top of a set of speakers. When people took turns speaking, it worked well enough, but I could not see more than a fifth or a quarter of the actual participants. There was also a projector that initally beamed my face onto a videoconference screen – I quickly added an online whiteboard where I summarised key points on virtual post-its (instead of posters in the room).

Most importantly, there was the wonderful countryside outside. It had been my plan to organise plenty of small group work, anyway – so, for most of the day, I invited the participants to wander off in random or purposefully composed duos and trios and quartets to work in the vast outdoor space. During small group work, the co-facilitator would walk around, listen in here and there, and ring me up with information as to how the groups were doing and what subsequent steps would make sense. After each small group session (varying from 15 minutes to an hour or so), the participants came back to the conference room to share key conclusions, which I recorded on the virtual whiteboard, before sending them off again with new small group assignments (in varying groups).

Near the end of the day, there was a strong feeling that one issue needed plenary discussion – again, I decided to relinquish control and make use of the outdoor space. I provided only simple rules for the discussion that would allow every participant to speak up in a calm atmosphere, and asked the co-facilitator to remind participants of the rules if needed. (Hint: I use rules inspired by Nancy Kline’s Time to Think.) After an hour, everybody came back to the webcam, seemingly refreshed – which is extremely unusual for a long workshop day! – and equipped with important insights.

Would I do it again? With a co-facilitator, OK equipment and a pleasant space for the participants, absolutely!

Everyday evaluation template

The evaluation budget is too small to give serious attention to the 45 evaluation questions you are supposed to answer within four weeks? Hanneke de Bode has the solution! She has shared a long rant about the contentious power of evaluations on a popular evaluation mail server.

Hanneke contributed to a discussion about the lack of published evaluations commissioned by non-governmental organisations (NGOs). Arguably, one reason is the limited quality one can achieve with often very limited resources for smaller evaluations. She has made such a beautiful point that I am not the only one sharing this on my blog – our much-esteemed colleague Jindra Cekan is also going to spread it across her networks, with Hanneke’s kind permission. And here comes Hanneke’s 101 for small evaluations! Does that ring a bell?

Most important elements of a standard evaluation report for NGOs and their donors, about twenty days of work about 20.000 € (VAT included)

In reality, the work takes at least twice as much time as calculated and will still be incomplete/ quick and dirty because it cannot decently be done within the proposed framework of conditions and answering all 87 questions or so that normally figure in the ToR.


The main issues in the project/ programme, the main findings, the main conclusions and the main recommendations, presented in a positive and stimulating way (the standard request from the Comms and Fundraising departments) and pointing the way to the sunny uplands. This summary is written after a management response to the draft report has been ‘shared with you’. The management response normally says:

  • this is too superficial (even if you explain that it could not be done better, given the constraints);
  • this is incomplete (even if you didn’t receive the information you needed)
  • this is not what we asked (even if you had agreement about the deliverables)
  • you have not understood us (even if your informants do not agree among themselves and contradict each other)
  • you have not used the right documents (even if this is what they gave you)
  • you have got the numbers wrong; the situation has changed in the meantime  (even if they were in your docs)
  • your reasoning is wrong (meaning we don’t like it)
  • the respondents to the survey(s)/ the interviews were the wrong ones (even if the evaluand suggested them)
  • we have already detected these issues ourselves, so there is no need to put them in the report (meaning don’t be so negative)

Who the commissioning organisation is, what they do, who the evaluand is, what the main questions for the evaluators were, who got selected to do this work and how they understood the questions and the work in general.


In the Terms of Reference for the evaluation, many commissioners already state how they want an evaluation done. This list is almost invariably forced on the evaluators, thereby reducing them from having independent status to being the ‘hired help’ from a Temp Agency:

  • briefings by director and SMT members for scoping and better understanding
  • desk research leading to notes about facts/ salient issues/ questions for clarification
  • survey(s) among a wider stakeholder population
  • 20-40 interviews with internal/ external stakeholders
  • analysis of data/ information
  • recommendations
  • processing feedback on the draft report

In the Terms of Reference, many commissioners already state which deliverable they want and in what form:

  • survey(s)
  • interviews
  • round table/ discussion of findings and conclusions
  • draft report
  • final report
  • presentation to/ discussion with selected stakeholders

Many commissioners send evaluators enormous folders with countless documents, often amounting to over 3000 pages of uncurated text with often unclear status (re. authors, purpose, date, audience) and more or less touching upon the facts the evaluators are on a mission to find. This happens even when the evaluators give them a short list with the most relevant docs (such as grant proposal/ project plan with budget, time and staff calculations, work plans, intermediate reports, intermediate assessments and contact lists). Processing them leads to the following result:

According to one/ some of the many documents that were provided:

  • the organisation’s vision is that everybody should have everything freely and without effort
  • the organisation’s mission is to work towards having part of everything to not everybody, in selected areas
  • the project’s/ programme’s ToC indicates that if wishes were horses, poor men would ride
  • the project’s/ programme’s duration was four/ five years
  • the project’s/ programme’s goal/ aim/ objective was to provide selected parts of not everything to selected parts of not everybody, to make sure the competent authorities would support the cause and enshrine the provisions in law, the beneficiaries would enjoy the intended benefits, understand how to maintain them and teach others to get, enjoy and amplify them, that the media would report favourably on the efforts, in all countries/ regions/ cities/ villages concerned and that the project/ programme would be able to sustain itself and have a long afterlife
  • the project’s/ programme’s instruments were fundraising and/ or service provision and/ or advocacy
  • the project/ programme  had some kind of work/ implementation plan


This is where practice meets theory. It normally ends up in the report like this:

Due to a variety of causes:

  • unexpectedly slow administrative procedures
  • funds being late in arriving
  • bigger than expected pushback and/ or less cooperation than hoped for from authorities- competitors- other NGOs- local stakeholders
  • sudden changes in project/ programme governance and/ or management
  • incomplete and/ or incoherent project/ programme design
  • incomplete planning of project/ programme activities
  • social unrest and/ or armed conflicts
  • Covid

The project/ programme had a late/ slow/ rocky start. Furthermore, the project/ programme was hampered by:

  • partial implementation because of a misunderstanding of the Theory of Change which few employees know about/ have seen/ understand, design and/ or planning flaws and/ or financing flaws and/ or moved goalposts and/ or mission drift and/ or personal preferences and/ or opportunism
  • a limited mandate and insufficient authority for the project’s/ programme’s management
  • high attrition among and/ or unavailability of key staff
  • a lack of complementary advocacy and lobbying work
  • patchy financial reporting and/ or divergent formats for reporting to different donors taking time and concentration away
  • absent/ insufficient monitoring and documenting of progress
  • little or no adjusting because of absent or ignored monitoring results/ rigid donor requirements
  • limited possibilities of stakeholder engagement with birds/ rivers/ forests/ children/ rape survivors/ people in occupied territories/ murdered people/ people dependent on NGO jobs & cash etc.
  • internal tensions and conflicting interests
  • neglected internal/ external communications
  • un/ pleasant working culture/ lack of trust/ intimidation/ coercion/ culture of being nice and uncritical/ favouritism
  • the inaccessibility of conflict areas
  • Covid

Although these issues had already been flagged in:

  • the evaluation of the project’s/ programme’s first phase
  • the midterm review
  • the project’s/ programme’s Steering Committee meetings
  • the project’s/ programme’s Advisory Board meetings
  • the project’s/ programme’s Management Team meetings

very little change seems to have been introduced by the project managers/ has been detected by the evaluators.

In terms of the OECD/ DAC criteria, the evaluators have found the following:

  • relevance – the idea is nice, but does it cut the mustard?/ others do this too/ better
  • coherence – so so, see above
  • efficiency – so so, see above
  • effectiveness – so so, see above
  • impact – we see a bit here and there, sometimes unexpected positive/ negative results too, but will the positives last? It is too soon to tell, but see above
  • sustainability – unclear/ limited/ no plans so far

If an organisation is (almost) the only one in its field, or if the cause is still a worthy cause, as evaluators you don’t want the painful parts of your assessments to reach adversaries. This also explains the vague language in many reports and why overall conclusions are often phrased as:

However, the obstacles mentioned above were cleverly navigated by the knowledgeable and committed project/ programme staff in such a way that in the end, the project/ programme can be said to have achieved its goal/ aim/ objective to a considerable extent.


Most NGO commissioners make drawing up a list of recommendations compulsory. Although there is a discussion within the evaluation community about evaluators’ competence to do precisely that, many issues found in this type of evaluation have organisational; not content; origins. The corresponding recommendations are rarely rocket science and could be formulated by most people with basic organisational insights or a bit of public service or governance experience. Where content is concerned, many evaluators are selected because of their thematic experience and expertise, so it is not necessarily wrong to make suggestions.

They often look like this:

Project/ programme governance
  • limit the number of different bodies and make remit/ decision making power explicit
  • have real progress reports
  • have real meetings with a real agenda, real documents, real minutes, real decisions and real follow-up
  • adjust
  • communicate
Organisational management
  • consult staff on recommendations/ have learning sessions
  • draft implementation plan for recommendations
  • carry them out
  • communicate
Processes and procedures
  • get staff agreement on them
  • commit them to paper
  • stick to them – but not rigidly
  • communicate

Obviously, if we don’t get organisational structure and functioning, programme or project design, implementation, monitoring, evaluation and learning right, there is scant hope for the longer-term sustainability of the results that we should all be aiming for.