Gender equality in organisations

This is a blog post from 2020, moved from my former blog which I will take offline near the end of this year.

Gender equality is a key element of sustainable development – as illustrated in the Sustainable Development Goals (SDGs), which weave gender across virtually all 17 SDGs. It makes sense that ‚mainstream‘ organisations, which are not specialised in promoting gender equality, have developed gender policies and related activities. Where are they at, and what should come next?

Vera Siber and I carried out a study with four German organisations to find out about their work on gender justice: a political foundation, two non-governmental organisations (NGOs) specialised in international development (a faith-based one and a secular one), and a scientific agency attached to a federal ministry. The four organisations differed in the scope of their work, their size, and the degree to which they stated gender justice as an explicit goal – but they came together to commission our study.  We reviewed documentation produced by the four organisations and interviewed some 50 persons representing different perspectives within those groups.  

The framework developed by Gender at Work guided our analysis. It is a matrix around two axes: formal/informal and individual/systemic. That is, it defines four realms: The individual/informal square relating to personal consciousness and capabilities, the systemic/informal one to unwritten norms and practice. The individual/formal square refers to individual resources, the systemic/formal one rules and policies. 

The four squares of the matrix look different in each of the four organisations (cases) we researched. On the formal/systemic side, all cases displayed gender policy papers, but the documents varied enormously in their scope and precision. Three organisations employed gender specialists; one did not. In all cases, staff members from different departments met regularly to discuss gender issues – but only in one case, job descriptions allocated time for those activities. The degree to which gender was integrated in planning and monitoring processes varied widely. On the formal/individual side, women in one case found it easier to reach leadership positions thanks to an adapted recruitment process, and dedicated mentoring and leadership training.

Our study has confirmed the notion that gender mainstreaming unfolds tangible outcomes when combined with specific work on gender equality. For example, one organisation had supported women’s organisations in South Asia for many years. They introduced those organisations to ‘mainstream’ grantees – i.e., grantees with no specific feminist agenda – to strengthen their thinking and action so that women and girls could contribute to and benefit more fully from their work. In the same case, success stories and pressure by feminist grantees contributed to reshaping the donor’s overall regional strategy. 

The informal side of the Gender at Work matrix is to a great extent about individual commitment – present in all four cases we reviewed –, and organisational culture. In one case, committed staff members put in their ‘own’ time to organise internal workshops on gender. In that way, they built knowledge within the organisation ‚bottom-up‘, and pressured for more support from the top levels for gender equality. In an opposite case, organisational leadership successfully pushed for the implementation of a progressive gender policy. This top-down approach, arguably necessary when attempting to mainstream gender across an organisation and its work, has raised worries among some of our interlocutors: Would it still be possible to openly voice doubts, start controversial discussions and introduce new ideas? 

Our study could not answer that question. What emerged clearly was that even organisations with rather advanced systems for gender mainstreaming must continue to update their knowledge and re-examine their goals and approaches regularly, as new needs and interests emerge. For instance, work on the rights of lesbian, gay, bi- and intersexual, transgender and queer persons (LGBTIQ), as well as intersectional approaches that take into account multiple discriminations, were still in their infancy in most cases. Also, from 2017 on, the #metoo movement against sexual harassment at work sparked a need to introduce or strengthen policies and processes. At the time of our research, anti-harassment policies had only just been introduced – or were still in the process of development – in most of the reviewed organisations. 

There is no end point to work on gender equality. It takes constant, deliberate, and well-informed efforts to secure the commitment of everyone in an organisation and to ensure its work contributes to gender equality in a changing world. At the very least, organisations should make sure they do not deepen existing inequalities (do no harm). 

The 2030 Agenda for Sustainable Development exhorts all states to leave no-one behind: Diversity and the ensuing differences in people’s needs and interests must be acknowledged and dealt with. Sexism, racism, and other forms of discrimination within organisations and beyond must be identified and countered. There is plenty of instructive experience around the world – organisations can tap into it by multiplying opportunities for exchange, open debate, and joint learning. All this requires dedicated resources. 

Why not try out gender budgeting, i.e., a process whereby organisations systematically examine their budgets against the anticipated effects on gender equality? If international development agencies can teach governments in the ‘global South’ to introduce gender budgeting, surely, they can do it within their own systems? If these agencies require their partner organisations to display a gender-balanced leadership structure, surely, they can organise their own leadership along the same lines? Would that be a good resolution for 2021 and beyond?

Thoughtful guidance on applying evaluation criteria

This is a blogpost from 2021, moved from my former blog which I will take offline later this year.

Long-awaited new guidance on applying the evaluation criteria defined by the Development Assistance Committee of the Organisation for Economic Cooperation and Development) (OECD-DAC) is finally available in this publication! Long-awaited, because evaluators and development practitioners have grown desperate with assignments that are expected to gauge every single project against every single OECD-DAC criterion, regardless of the project’s nature, and of the moment & resources of the evaluation. This new, gently worded document is a weapon evaluators can use to defend their quest for focus and depth in evaluation.

Those who commission evaluations, please go straight to page 24, which states very clearly: „The criteria are not intended to be applied in a standard, fixed way for every intervention or used in a tickbox fashion. Indeed the criteria should be carefully interpreted or understood in relation to the intervention being evaluated. This encourages flexibility and adaptation of the criteria to each individual evaluation. It should be clarified which specific concepts in the criteria will be drawn upon in the evaluation and why.“

On page 28, you will find a whole section titles Choosing which criteria to use which makes it clear that evaluations should focus on the OEC-DAC criteria that make sense in the view of the needs and possibilities of the specific project, and for the evaluation process. It provides a wonderful one-question heuristic: „If we could ask only one question about this intervention, what would it be?“ And it reminds readers that some questions are better answered by using other means, such as research projects or a facilitated learning process. The availability of data and resources – including time – for the evaluation helps determine which evaluation criteria to apply, and which not. Page 32 reminds us of the necessity to use a gender lens, with a handy checklist-like table on page 33 (better late than never).

About half of the publication is dedicated to defining the six evaluation criteria – relevance, coherence, effectiveness, efficiency, impact, and sustainability – with plenty of examples. This is also extremely helpful. Each chapter comes with a table that summarises common challenges related to each criteri on – and what evaluators and evaluation managers can do to overcome them. It also shows very clearly that lack of preparation on the evaluation management side makes it very hard for evaluators to do a decent job – see for example table 4.3 (p.55) on assessing effectiveness. 

The document is a bit ambiguous on some questions: The chapter on efficiency still defines efficiency as the conversion of inputs (…) into outputs (…) in the most cost-effective way possible, as compared to feasible alternatives in the context“ (p.58), which makes it extremely hard to assess the efficiency of, say, a project that supports litigation in international courts – interventions that may take decades to yield the desired result. However, the guidance document states that resources should be understood in the broadest sense and include full economic costs. On that basis, one can indeed argue, as Jasmin Rocha and I have on Zenda Ofir’s blog, that non-monetary costs, hidden costs and the cost of inaction must be taken into account. Yet, table 4.4 on efficiency-related challenges remains vague (p.61). Has anyone read the reference quoted in the table (Palenberg 2011)? I did and found it very cautious in its conclusion. My impression is that in many cases, evaluators of development interventions are not in a position to assess efficiency in any meaningful manner.

On the whole, I would describe the new OECD-DAC publication as a big step forward. I warmly recommend it to anyone who designs, manages or commissions evaluations.

Why ask why in theory-based evaluation

A theory of change is a wonderful instrument to explore the „why“ and „how“ of an intervention. So why do evaluations make such patchy use of theories of change? Often, it is because evaluation questions ask mainly“how much“. This blog narrates how I have come to this conclusion.

Risk management for evaluation managers

Why do many evaluation reports yield only weak insights? Having worked in all three corners of the evaluation triangle – as an evaluator, as an evaluation commissioner / manager, and as a stakeholder in interventions under evaluation (the evaluands) – I find that we can only put part of the blame on evaluation teams. Often, evaluations come with high expectations which low budgets and narrow timeframes cannot fulfil. If, on top of that, evaluations are poorly prepared, evaluation teams may find themselves struggling with scope creep and shifting goalposts. They will spend much of their time trying to understand the evaluand and negotiating the evaluation scope with the client, wasting time that should be spent on proper data collection and analysis. Better preparation and accompaniment of evaluations could make a big difference. Ideally, that should happen as part of an evaluability assessment and before the evaluation terms of reference (TOR) are finalised.

Howard White, a specialist in evaluation synthesis, has posted a list of 10 common flaws in evaluations. There are other flaws one could find. But I would propose to reflect on solutions that all parts of the evaluation triangle can contribute to. Evaluations work best when evaluators, evaluation managers and those who represent the evaluand work together as critical partners. In recent years, I have supported organisations in their evaluation management, so this post focuses on things that evaluation managers can do to prevent Howard’s ”10 flaws” (in italics below) from happening. Let’s look at them one by one!

1. Inadequate description of the intervention: Ideally, all evaluation reports start with the description of the evaluand. If the evaluand is one project implemented by one organisation in one country, it shouldn’t be too hard to fit that within a couple of pages. If it is a collection of programmes encompassing cascades of diverse activities by hundreds of organisations around the world, evaluators need to be a bit more abstract in their introductory description. But obviously they need to understand the evaluand to design the appropriate evaluation!
Evaluation managers can map the components of the programme, review its theory of change, and organise the documentation so that evaluation teams can make sense of it. This is particularly important if the evaluand is too complicated to be adequately described in the TOR. A good example from my practice was a portfolio evaluation: Before commissioning the evaluation, evaluation management developed a database listing key features of all projects in the portfolio. That made it easy to understand and describe the evaluand, and to select key cases for deeper review. Conversely, in a different assignment, my team spent (unplanned) months trying to make sense of the– sometimes contradictory – documentation and verbal descriptions of the sprawling evaluand.

2. ‘Evaluation reports’ which are monitoring not evaluation: Evaluation managers can prevent this problem by formulating appropriate evaluation questions. Often, evaluation questions start with “to what extent…, followed by rather specific questions about the achievement of certain results. Those kinds of questions risk limiting the evaluation to a process monitoring exercise, or some kind of activity audit. For programme learning, it is useful to ask questions starting with “why” and “how”.

3. Data collection is not a method: Evaluation managers can make sure the TOR requests evaluators to describe the approaches and methods they use in the evaluation, for data collection and for analysis respectively. They can look for gaps in the inception report, ideally checking the annexes as well, to find out whether the proposed instruments match the proposed methodology. That takes some specialist knowledge – ideally, evaluation managers should have substantial first-hand evaluation experience or a background in applied research.

4. Unsubstantiated evidence claims: Evaluation managers can invite evaluation teams to structure their reports clearly, so that each finding is presented with the supporting evidence. Many evaluations I have seen weave their findings and related evidence so closely together that it is hard to tell them apart – a style that is often described as “overly descriptive”. Obfuscating the boundaries between data and evidence can be a strategy to hide findings about gaps and failure in programmes. Where programme teams are hostile to challenging findings, evaluation managers can play a role in defending the evaluation team’s independence, and their mission to support learning from success and from failure.

5. Insufficient evidence: The amount of evidence an evaluation team can generate depends to a great extent on the time and other resources they have. One important role of evaluation managers is to ensure a good balance between expectations from the evaluation and resources for the evaluation. If an organisation expects an evaluation to answer, say, 30 complex questions on an evaluand encompassing tens of thousands of diverse interventions in diverse contexts within half a year, it must be prepared to live with evidence gaps.

6. Positive bias in process evaluations: Positive bias can arise from poor evaluation design (see also points 2, 3 and 4 above). It can also be linked to evidence gaps (see point 5 above) – when in doubt, evaluators hesitate to pass “negative judgements”. But often, positive bias slips in near the end of the evaluation process, when programme managers object against findings about gaps, mishaps, or failure in their programme. That takes back to the role of evaluation managers in fostering commitment to learning from failure.

7. Limited perspectives: Who do evaluators speak to? This problem is related to issues 1, 6 and 7 above. Where resources for an evaluation are limited, fieldwork might be absent or restricted to the most accessible areas (when I worked in China, they called such places “fields by the road”, always nicely groomed). When working on a shoestring, evaluators will struggle to sample, or to select cases, purposefully. But they can still speak to people representing different perspectives. Evaluation managers can encourage that, by mapping stakeholders in the TOR and explicitly asking for interviews with people who are underrepresented.

8. Ignoring the role of others: If most evaluation questions focus on programme performance, evaluators will focus on programme performance. Often, evaluation TOR address the role of others only in a brief question related to the coherence (OECD-DAC) criterion. But questions about effectiveness, impact and sustainability can also be framed to encourage evaluators to look at the influence of other “actors and factors”.
Also, ideally, programmes should be built on preliminary context and stakeholder analyses
which should be continuously updated. Where that has happened, that information should flow into the TOR’s context section.

9. Causal claims based on monitoring data: Good monitoring data can be a helpful ingredient in an evaluation that triangulates data from different sources. There is no reason to believe people fake their monitoring data. It is just that most of the time, the amount and quality of monitoring data are inadequate. Monitoring and evaluation specialists can make sure each programme has a monitoring system that produces data which are useful for monitoring and for evaluation. Furthermore, evaluation TOR should remind evaluators the need to triangulate data, i.e., to compare data sourced from different perspectives via different data collection tools.
Howard mentions a separate point under the “9th flaw”, the attribution problem: “Outcome evaluations present data on outcomes in the project area and claim that any observed changes are the result of the project.” But evaluators are not going to solve that problem by collecting data from a greater variety of perspectives. They need to be encouraged to look beyond the evaluand as a likely cause of the desired effects – see point 8 above.

10. Global claims based on single studies: As pointed out by Howard, lessons from a specific evaluation are only relevant for the intervention being evaluated. That is something that everyone in the evaluation triangle needs to be aware of. Evaluation managers are well placed to remind decision-makers in their organisations of the fact that an evaluation is about the evaluand only. It can feed into a broader body of evidence, but it should never be the only basis for decision-making beyond the context of the evaluand.

We have reached the end of the list, but there is so much more that can go wrong in evaluations. Investing in good preparation, and, once the evaluation team is recruited, building rapport and effective communication between evaluation managers, programme implementers and evaluators, are essential for risk management in evaluations.

FGDs mean groups with focus & discussion!

This year again, I feel privileged to serve on a panel of senior evaluators who advise a multilateral donor on evaluation approaches and methods. And this year again, I feel saddened by the widespread neglect of qualitative data collection. All evaluations I have reviewed (cumulatively, I have reviewed hundreds…) include at least some elements of qualitative data collection – key informant interviews (KIIs), for example, or focus group discussions (FGDs). Even in (quasi-) experimental setups that rely on large standardised surveys, qualitative data are used to build questionnaires that resonate with the respondents, or to deepen insights on survey findings.

We need good data for good evaluations. Too often, the KII and FGD guides I see appended to evaluation reports are not likely to elicit good data: They are worded in abstract language (some evaluators don’t even seem to bother translating highly technical evaluation questions into questions that their interlocutors can relate to), and they contain way too many questions. I have seen an interview guide listing more than 50 questions for 60-90-minute interviews. That won’t work. A FGD guide with 20 questions for a 2-hour discussion with 12 persons won’t work, either. You can gather answers to 20 questions within two hours, but they will come from just one or two participants and there won’t be any meaningful discussion. Discussion is the whole point of a FGD – you want to hear different voices!

In my practice, I like to work with smaller focus groups – about 3-8 persons – and I count about 1-3 questions per hour, plus time for a careful introduction. The questions should be phrased in a way that makes it easy to discuss them – avoid jargon, because jargon spawns jargon, often hard to interpret. The Better Evaluation library provides a helpful video that explains key principles of FGDs, even though I would be careful about mixing women and men in the some settings. In international cooperation, it has become common practice to organise separate focus groups with female and male participants respectively, to avoid male voices dominate, and to surface issues that people don’t like to discuss in front of representatives of other genders. You also need to consider other aspects of participants‘ identity – social class, for example – to obtain reasonably homogenous focus groups. And you could try to find a way of collecting data from people who don’t identify as female or male, especially when you wish to work in a fully gender-responsive (or feminist) manner. (Have a look at this week’s posts on the American Evaluation Association tip-a-day newsletter celebrating pride week!)

Back in 2019, I published a blog post on what I called classism in data collection – a widespread trend in international evaluations to hold KIIs with powerful people only, and to lump those who are supposed to ultimately draw some benefit from the evaluated project into large FGDs. I’ll repost the blog soon because I see this issue over and over again, and it is not only an inequitable practice, it also yields shoddy data. Watch this space!

AI again: Silva’s experience

Silva Ferretti, a colleague in international evaluation, has written an inspiring post on AI in evaluation that she has kindly allowed me to reproduce here. Sit back and enjoy the read!

>> I have been playing with Artificial Intelligence for some time now. I am amazed by it and actually surprised by the lack of debate regarding its role in development and humanitarian program management. Have I missed any discussions on this topic? If anyone has any information or pointers, I would greatly appreciate it. It is a game changer. We seriously should look into this NOW.

I learnt that:

It can write well-crafted logical frameworks and program concepts, as well as sectoral strategies, that are on par or even better than some real ones. It is able to anticipate risks and limitations, and propose detailed activities.
It is inclusive and politically aware, in a positive way. It has been trained to value inclusion and diversity, and is skilled at articulating ideas of participation and accountability, while also understanding that these ideas can generate conflict.
It is progressive and embraces a variety of methods and approaches. It can easily determine when rigorous/objective research is needed and when more constructivist methods should be used. It understands the advantages and areas of application for complexity-aware and feminist approaches.
It is creative and can use various communication styles. It suggested that conventional monitoring and evaluation methods may not be suitable for some programs and helped me generate anecdotes, commercials and even a rap song.
It excels at concepts, not facts. It does not provide references or links, and may sometimes confuse the names of standards or approaches. However, it understands the core concepts and can provide valuable insights. It is not a search engine, but a different paradigm.

What do I take from it?
1) the AI looks so good because a lot of developmental and humanitarian work is based on set approaches and jargon. We play by the book, when writing projects, when monitoring and evaluating change. This has advantages of course (we should not always reinvent the wheel!). But this is also where an AI works best. It is like these professionals good at making any project look cool, using the right words: nice, streamlined, even when reality is messy. And, sadly, what surfaces about many projects and programmes are just these sanitized proposals/reportings: confirmation of preset causal chains, with pre-set indicators… whilst local partners and change makers would tell more interesting and varied stories. It is the sanitized stories which eventually travels up the reporting chain, and into the AI of the future. This generates confirmation bias. And strengthens models accepted and established because we keep using them with the same lenses and logic. But reality is not like the blueprint.
2) the AI is more progressive than several professionals/institutions, in recognizing the whole field of complexity and complexity-driven approaches. Have a chat with it, asking what approaches are best in diverse contexts. It is adamant that participatory and empowerment processes require ad-hoc approaches. The lesson? That available evidence already indicates that there is not only one appropriate way to manage and evaluate (the bureaucratic/rigourous one). The fact that a machine understands the importance of the non quantifiable, of emergence, of feminist approaches – and some human managers don’t get it… – well, it makes me think a lot.
3) The AI can be really „creative“ when prompted. Try it out, and discover the many ways we could use to share the same concepts: poems, songs, riddles, conversations, anecdotes, stories. It is liberating, and a great way to free our own creativity and reach out to new audiences – when talking about change. It can add a whole new „communication dimension“ to monitoring, evaluation, and programming.
4) It is already happening. Artificial intelligence is not going to materialize in the far away future. You can do pretty decent work with it now. For routine tasks, including proposal writing, it is at least as good as a middle level officer needing supervision. How are we going to react? How should we use this tool? What will we teach to the next generation of professionals?

I am not afraid that AI can substitute humans. I am amazed, mesmerized by it. I find it stimulating. It provides, on a whim, the approach „by the book“ and free time to then go ahead, explore its details, the alternatives, the possibilities.
I fear that it can substitute humans as they are now used and considered by the development / humanitarian systems: bureaucrats applying tried and tested approaches, and having to confirm them using a standardized set of tools (which really do not allow to „go out of the box“). If this is the understanding, whole layers of professionals can be made redundant. Soon you can run your perfect programme, monitored by other AIs.
Which, to some extent, will (and probably also should!) happen.

What many people are already advocating re: monitoring, evaluation, learning is becoming more urgent. Do we have the tools to address the complexity ahead?
What is in that black box in between your question and AI response? How is the knowledge feeding it being built?
Our role will be, more and more, to check how ideas are formed, to explore mindsets behind the blueprints being formed, to check the boundaries of the issues we confront, to explore the links that are not yet there. Issues of interrelations, worldviews, boundaries, inclusion: it is the deep level use of systemic thinking that we should embrace, if we want to be enhanced, not trapped by the AI.

Am I optimistic? Not much, because the trend in the sector is for control, compliance, procedures. Even this list sometimes worries me when I see over and over the same debates about things that AI can already do better. But it consoled me that even the AI feels that humans still matter. In its words:

Humans have several unique abilities when it comes to understanding complexity that AI currently cannot replicate, here are some examples:

Empathy: Humans have the ability to understand and relate to the emotions, thoughts, and experiences of others. This allows them to understand the perspectives of different groups and to make decisions that take into account the needs and feelings of all stakeholders.
Intuition: Humans have the ability to make decisions based on their intuition or gut feelings, which can be particularly useful in complex and rapidly changing situations where there may be limited data or time to analyze information.
Creativity: Humans have the ability to think creatively and come up with innovative solutions to complex problems. They can think outside the box and come up with new and unexpected ideas.
Complex decision making: Humans can take into account multiple variables, such as cultural and political context, community dynamics, and the specific needs of different groups, and make decisions accordingly. They can also consider the ethical implications of their decisions.
Contextual understanding: Humans can draw on their experiences and knowledge to understand the specific context in which a humanitarian or development project is being implemented, and how it relates to the larger social, economic, and political environment.

It’s important to note that AI can assist in providing information and ideas, but the human ability to empathize, use intuition, think creatively, make complex decisions and contextual understanding are unique.In summary, my main function is to provide information and ideas, but the real work is done by the humans, who have the ability to understand the complexity of the situation, make decisions, and adapt to the context. Humans also have the ability to empathize and provide compassionate support, which is essential

All things that are recognized by complexity-driven, adaptive, participatory management practices. But are the development and humanitarian sector really run on this understanding?

Again… there are many urgent debates to have. And I would love to have pointers, resources, insights about them. Thanks in anticipation.

[If you wonder: I asked the AI if it agreed with the ideas shared in this mail „As a language model, I am not capable of having opinions or feelings. However, I can confirm that the statement you’ve provided is consistent with the capabilities that I have been trained on“. And, as I was there, I asked the AI to do some edits for readability – the initial bullet points. So, this mail is not co-written… but it is co-edited, discussed, validated. 🙂 ]

AI and evaluation

This year, there are lots of interesting discussions on the use of artificial intelligence in evaluation. I have started trying out ChatGPT to help me find out about mainstream information/opinion on the subjects I have worked on, develop workshop plans, and draft an email message. The results are interesting: at first sight, ChatGPT responses look quite good, but closer scrutiny makes them seem quite mediocre. That is normal, because ChatGPT relies on what is out there – all of it, or an average view of all of it, i.e., not the most advanced or the most exciting thoughts. Still, one can build on a medium useful response to develop something more suitable for the task at hand. An important first step is to question what the machine has produced and to reflect on what is missing in the machine generated answer. It could be the entire construct the answer is based on…

Regardless of more serious applications, ChatGPT has tricks to cheer up our work routines. This is what it has produced when I asked for a summary of principles of evaluation in the style of Emily Dickinson:

To assess, to know, to understand,
The ways of evaluation, so grand.
First, set clear goals for what you seek,
And measure progress, week by week.

Gather data, both near and far,
To see what’s working and what’s not par.
Look for patterns, trends, and clues,
And use this knowledge to inform your views.

Consider all perspectives, wide and narrow,
To understand the impact, deep and thorough.
Be open to feedback, both good and bad,
And use it to improve, like iron clad.

Celebrate successes, big and small,
And share them widely, one and all.
Acknowledge challenges, too, with grace,
And work together, to find a new pace.

In all your evaluations, always strive
To keep an open mind, and be alive.
For learning and growth, and progress to come,
Are the gifts of evaluation, to some.

Infographic summarising key findings on international advocacy

International advocacy

Last year I followed an exhilarating course on qualitative data visualisation with the PME Campus (in German). Among many other things, we learned how to design infographics using online platforms that make it really easy. For the example below, I first created a word cloud based on the short description of a funding scheme for international advocacy. Then, I uploaded the word cloud as a background for my infographic. A bit messy, but fun!

Infographic summarising key findings on international advocacy

International advocacy – an infographic

Last year I participated in a very enjoyable and informative course on qualitative visualisation with the PME Campus (in German). Among other things, we learned to design infographics. There are plenty of web-based platforms with a wealth of templates and elements for infographics. For the example below, I generated a word cloud based on the description of a funding scheme that supports international advocacy. Then I uploaded the cloud as the background to my infographic. A bit messy but fun!

Michaela’s first infographic

Feminist foreign policy and evaluation

DEval, the German Institute for the Evaluation of Development Cooperation, celebrated its 10th anniversary last night. It was a real-life event in a beautiful Berlin location bringing together an impressive crowd, including among others Svenja Schulze, our Federal Minister for Economic Cooperation and Development. One of the topics of her keynote speech was the current federal government’s commitment to feminist development policy. What does that mean for evaluation? Responding to a question by Minister Schulze, Jörg Faust, Director of DEval, came up with four aspects:

  • A Do No Harm/research ethics, e.g., by anonymising data about interviewees
  • Context-sensitive research
  • Evaluation design that ensures a wide spread of people are ‚appropriately heard‘
  • More diverse evaluation teams

While these elements definitely make good ingredients for a feminist approach to evaluation, I wonder what is feminist about it. Shouldn’t any evaluation tick all these boxes?

As the Federal Ministry for Economic Cooperation and Development (BMZ) puts it, „feminist development policy is centred around all people and tackles the root causes of injustice such as power relations between genders, social norms and role models.“ Let’s set aside this concept of ‚centering around all people‘ – I guess it only means that feminist policy is not for women only. Let’s look at the other half of the sentence. Wouldn’t that mean that evaluations should look into power relations and other (potential) root causes of gendered injustice, or at least examine whether and how projects have attempted to address those root causes? And what does it take for non-male people at the margins of society to be ‚appropriately heard‘? Won’t evaluators need to spend more time listening to more non-male people, in their own languages (btw. Translators without Borders appears to be doing a wonderful job on this)? Shouldn’t we have individual conversations not only with those that hold positions of power in a project, but also with intended ‚ultimate beneficiaries‘ of various backgrounds?

This is an aside, but an aside that is close to my heart. Often, I find it somewhat disrespectful and methodologically dodgy when evaluators organise group discussions for ‚grassroots‘ women to share how a project has changed (or not) aspects of their lives, while more privileged project stakeholders and external specialists are interviewed individually. Wouldn’t a feminist approach have to put this upside down, by inviting powerful people to reflect on project & context issues in focus groups, and organising individual interviews to learn about ‚grassroot women’s‘ personal experience in the project?

And, as evaluators, could we make a bigger effort to speak with women’s and lesbian, gay, bi, trans, intersex and queer (LGBTIQ) rights groups wherever we go, and generally identify more diverse experts for our key informant interviews? How about involving local/national/regional women’s and broader human rights experts and activists in the development of our data collection tools, in data analysis, and in crafting locally viable recommendations with a potential to transform power relations?

Sounds like this is asking too much? True, many evaluations I have come across (and I have seen many, in many roles) display only modest efforts to integrate gender and equity concerns, even though equity is part of the updated OECD-DAC effectiveness criterion for evaluation. Often, all you learn from such evaluations are the old messages that women and girls are worse off than the rest, and that social norms are to blame for that. Not very satisfying.

But there are evaluations out there, carried out by teams with a keen sense for rights-based work and power analysis, which have made the effort to reveal and test assumptions on gender roles underlying the programm logic. They have shown how a programme logic or theory of change that builds on a mistaken understanding of gender roles contributes to unwanted effects. That is the kind of finding that makes it into the executive summary of an evaluation report, and that is likely to open people’s eyes to the harm a conventional, gender blind approach to development can cause. Let’s not allow ‚feminist evaluation‘ to become a mere buzzword, or an excuse for wishy-washy methodologies. Let’s turn it into something meaningful that will yield new, potentially transformative, insights.