Trauma sensitivity in evaluation

Last year I carried out a review of evaluations of programmes and projects in (post-) conflict contexts that worked with survivors of sexualised and gender-based violence (SGBV). As the evaluation reports tended to yield limited information on the data collection and analysis processes, I interviewed some of the evaluators, focussing on those based in the countries or regions of the interventions. The interviews were extremely insightful, showing that it takes more than ‚routine‘ research ethics to manage risks when working with persons affected by trauma.

There is some guidance on trauma-informed data collection, for example, the tips and resources provided in the American Evaluation Association’s „Trauma-informed Eval Week“ blog series (January 2021). Everyone involved in evaluation should have a good look at those and make sure participant welfare is prioritised over harmful, extractive data collection.

A plea for more unconventional, more localised evaluations

This post shares my experience with a hybrid evaluation in a team of peers, and plea for more unorthodox cooperation in evaluations.

Last year I carried out an evaluation as part of a team of three researchers – two from the country of the project to be evaluated (in Asia), plus me working online from Germany. All of us were quite senior professionals, but only I had proven evaluation expertise. So, we agreed that the colleagues in the project country would be responsible for data collection, analysis, and report writing. My role was to take the lead in designing the evaluation, to provide quality assurance, and to edit the evaluation report so that it would comply with donor needs.

It was my first time not to be the lead evaluator in an international team. We worked together as peers, and that worked very well. Together, we developed an evaluation matrix with all the questions, fields of observation and data sources, as well as guiding questions for data collection. We stayed in touch online during the data collection phase, had a couple of online workshops to extract findings, and wrote the report together. Nothing special, it would seem, but it is still rare – at least in the context of German development and humanitarian cooperation – to find evaluations where the “international” consultant is not automatically the team leader.

My colleagues worked within their own country, in their own languages, knowing very well how to ask their questions, how to interpret the answers, and how to read between the lines. There was no risk the “international” team member would ask locally inappropriate questions or miss or misinterpret important data. My colleagues did not need to read up on their national political, economic, and social situation to understand the project context. They drew on decades of experience in the sector the project was about, in the very country where the project was implemented. They were social researchers, not trained evaluators, and that was fine.

The idea of “localising” evaluation in development and humanitarian cooperation has been around for a while – as illustrated, for instance, by the fact that the German Institute for Development Evaluation [link] is about to celebrate the 10th anniversary of its training programme for evaluators in Germany’s partner countries. Let’s have more locally led evaluations! In teams mixing locals and foreigners, let’s find an arrangement that ensures all can fully contribute their skills and knowledge. The international and/or the trained evaluator does not necessarily have to be the team leader. And the old formula “local research assistants, foreign researchers” does not make sense anymore (if it ever did).

grafiken final-22

Do we need feminist evaluation?

The concept of feminist evaluation gets a great deal of attention these days. Is it just a new label for gender-sensitive evaluation? Given that even gender-sensitive evaluation seems to be a challenge for many evaluation teams and evaluation managers, do we need feminist evaluation just now? If feminism focuses on gender inequality, will it help us to introduce stronger, broader equity orientation in evaluations?

These are questions I have bounced off with my colleague Khalil Bitar when we jointly reviewed a set of evaluation reports. The purpose of our review was to distil learning on evaluation methods for an organisation commissioning many evaluations. Gender mainstreaming was a requirement across those evaluations. In parallel, I reviewed a different set of evaluations commissioned by a women’s rights organisation interested in feminist evaluation. Khalil’s in-depth knowledge of equity-oriented evaluation has helped to expand our reflection.

The concept of feminist evaluation (FE) is not totally new; it has been discussed for at least two decades. A collection of articles on the subject was published in 2014, edited by Sharon Brisolara, Denise Seigart and Saumitra SenGupta (including contributions by Donna Podems, Donna Mertens, Jennifer Greene, and other eminent evaluators). In recent years, Global Affairs Canada has built on literature on FE to develop guidance. With governments in the ‘global North’ promoting feminist foreign policy and feminist development policy, it seems more urgent to understand what feminist evaluation is about – and whether and when we need it.

Many aspects of FE are found in gender sensitive or gender responsive evaluation: its transformational paradigm, its interest in uncovering root causes of inequality, its participatory character, and its ambition to ground evaluation in local contexts. All these approaches are also supposed to examine intersectionality, i.e., the criss-crossing and interacting layers of inequality and oppression linked to people’s identities (gender, class, age, and many more aspects).

Feminist evaluation may not be fundamentally different from proper gender sensitive or -responsive evaluation. But then, how much gender responsive evaluation do we see out there? In dozens of evaluation reports I have reviewed, gender sensitivity is misinterpreted as “counting women”, or “describing women’s plight”. Often, an analysis of root causes is missing; power imbalances are ignored.

Since the evaluation sector has performed poorly with gender sensitivity, is it ready for feminist evaluation yet? If FE is just another fad or flag that evaluators wave in their offers before carrying out an evaluation that is not even gender sensitive, we do not need it. But gender inequality is one of the most widespread forms of social inequality. Can evaluators just stand by and watch? I would say, no. The current emphasis on feminist policy may offer new opportunities to strenghten gender responsiveness in evaluation. But we cannot afford to pay attention to a single form of inequality only.

That is where intersectionality and equity focused evaluation (EFE) come in. EFE has much in common with feminist evaluation: EFE is not only about what we evaluate, but also about using evaluation to reduce inequities – like feminist evaluation, it has a transformative paradigm. Like feminist evaluation, it is supposed to uncover root causes of inequity, and find ways of addressing them. Like feminist evaluation, it should be participatory, by listening to and amplifying the voices of underrepresented people. Khalil and I believe that proper feminist evaluation needs to be equity-oriented to unfold its potential to support change for greater gender justice. If feminist evaluation neglects the elements it has in common with EFE, it reduces its chances to contribute to social transformation. One could even define FE as a form of EFE.

But not all equity-oriented evaluation needs to be feminist, i.e., to focus on gender justice – in certain evaluation contexts, it may be more appropriate to focus on inequity linked on other aspects of people’s identities (e.g., indigenous status, social class, disability…).

Finally, for feminist or other transformative evaluations to work, they need to be commissioned by people/organisations that are genuinely interested in and ready to promote social change. So, do we need feminist evaluation? If we are serious about gender equality, yes – but, as pointed out by Donna Podems, (i) we don’t always have to call it „feminist“ (especially in today’s hostile contexts), and (ii) we should apply it deliberately where it fits (and not everywhere and cosmetically).

grafiken buch-31

Good intentions in need of appropriate resourcing: OECD guidance on human rights and gender in evaluation

The Organisation for Economic Co-operation and Development (OECD) has issued a guidance document on Applying a Human Rights and Gender Equality Lens to the OECD Evaluation Criteria. It is wonderful to have this new resource – but a few shortcomings may make it hard to apply the guidance in real life.

First, the good things! About half of the 60-page volume is dedicated to explaining how human rights and gender equality (HRGE) considerations can be integrated (mainstreamed) into each of the six OECD criteria for evaluation in international cooperation (relevance, coherence, effectiveness, efficiency, impact, sustainability). There are inspiring ideas: For instance, the volume invites evaluators to assess internal coherence of the intervention under evaluation (the evaluand) by checking whether it is aligned with human rights treaties and related policies. Also, the publication includes helpful definitions and great examples from real evaluations, as well as sample evaluation questions. It is written in a style that is accessible to evaluation specialists (not quite plain English, but not too jargony). And it points to plenty of useful references.

The document encourages evaluators to apply a human rights and gender equality lens to all evaluation criteria. However, there is no discussion of the RESOURCES needed for meaningful implementation of the guidance. In that way, OECD risks encouraging tokenist tick-boxing/flag-waving motions instead of serious consideration of HRGE. For example, OECD invites evaluators to reconstruct the evaluand’s theory of change with special attention to HRGE, to detect intended and unintended HRGE effects on various groups of people. That can work if the project focuses on HRGE, but if it doesn’t, it takes extra time and expertise to add the HRGE dimensions. Also, the document commendably advocates for systematic participation of a wide spectrum of rights holders in the evaluation process – not just as data sources. To make this possible, people need to be reached, invited and reimbursed for any costs, translation needs to be organised and so forth. The guidance would be more useful if it included estimates of the extra time and resources it takes to translate it into practice.

Another issue is about LEARNING. I love the fact that there are nifty tables with sample HRGE-sensitive evaluation questions for each OECD criterion. But most questions start with the phrase „To what extent…“, which invites accountability-focused answers of the type yes/no/somehow. But isn’t evaluation also about discovering what has worked, under what conditions, and what not, and why? Obviously, these are aspects that an enterprising evaluator can discuss even under a question that starts with „To what extent…“, but we don’t always have time to add extra depth. The guidance would be more useful if it was geared to support both accountability- and learning-oriented evaluations.

There is another point I find difficult. Human rights are, by definition, indivisible and interdependent (a good reference is the definition by the Office of the High Commissioner for Human Rights). But the OECD resource invites evaluators to define which human rights principles are most relevant to the evaluand. Again, that could be OK for an intervention that focuses on a specific set of rights (e.g., political participation of indigenous people). But how can an evaluation team decide which set of rights should be considered when evaluating a solar energy project, a police training initiative or a multi-sector regional development programme? Should they privilege political over social rights, for example? Should they pick the right that it takes the least effort to consider (if no extra resources are available for the HRGE lens)? Is it legitimate to pick just one set of rights and leave aside all others? That question deserves careful consideration in future editions of the guidance.

Rather than attemting to plough HRGE concerns into all OECD criteria, one could put togehter a few minimum standards for HRGE sensitivity that should be applied (and resourced for) across all evaluations. This could include:

  • Impeccable ethics (including trauma sensitivity, as helpfully pointed out in the OECD volume)
  • A degree of equity orientation, e.g. by considering unequal distribution of desired effects, and unexpected/unwanted effects by population group
  • Communication of the evaluation purpose and findings to rights holders (as suggested by OECD)
  • Mainstreaming HRGE concerns into evaluation questions around relevance and effectiveness

Internal evaluations need external perspectives

A 2019 post from my former blog developblog.org, which I ran from 2008 to 2021.

Internal evaluation can be an excellent way to check the quality of one’s work, to track progress (in programmes or projects, for instance) and to gather information for management decisions and longer-term learning. To make the most of such exercises, they should go beyond self-reflection. Especially for small to medium-sized teams or organisations, sitting around a table and contemplating one’s strengths and weaknesses, as well as successes and failures, is a good start, but just not enough.

Things you can do to gather more insights and make the most of them:

If you regularly collect and document information from partner organisations, clients or other people involved in or affected by your work, use it! Use it to find out whether the activities you and your partners carry out do – or are likely to – contribute to the goals you pursue. Use it also to examine – or read between the lines – how the quality of your organisation’s work is perceived.
You can also bring such information to a „data party“ with people outside your organisation – for instance, some of those who are supposed to benefit from your projects, or else external specialists in the field you work in. The idea is to make sense of the information from your projects/activities together, every participant with their own perspective. (Obviously, you will have to make sure data are sufficiently aggregated and anonymised so as to avoid violating anybody’s privacy.)

If you don’t continuously gather information from those involved in your projects/activities, then you can carry out your internal reflection in stages – for instance, (1) you decide together which questions (a handful at most!) your internal evaluation should answer, and (2) then you allow for a few week’s time to gather information – for instance, in conversations with stakeholders and external persons, just like an external consultant would do in a „qualitative“ evaluation. 
If you don’t have time for that, you can replace item (2) by a consultation bringing together people who are directly involved or affected by your work. Here, external facilitation can help create an atmosphere and a work flow that enables everyone to openly share their experience and their perceptions of your organisation’s work. 

Both approaches take more time than a simple half-day workshop of navel-gazing. There is nothing wrong with workshops or short retreats – any break from a busy work routine can be beneficial. But involving others will multiply your chances to gather precious new insights. Try it out!

Two or three reasons for working in tandem

A May 2019 blog post from my former blog, www.developblog.org

Evaluations come in many shapes and sizes. I have led multidisciplinary teams in multi-year assignments, and carried out smaller assignments all by myself. Last year was a lucky year, because most of my work happened in one of my favourite configurations: the tandem or duo – as in two competent persons with complementary or partly overlapping skills and knowledge working together as evaluators on an equal or near-equal footing. Two evaluators working together – even if one of them participates for a shorter spell of time than her colleague – means so much more than the sum of two persons‘ capacities. 

Obviously, two persons can carry out more work than one, and two pairs of eyes and ears perceive more than one. More importantly, two different persons are likely to interpret data differently, from their different perspectives. In my recent tandem assignments, we – the two evaluators – discussed our findings every day when we worked in the same location. At times we’d split for a few days; in those cases, we’d exchange via the phone or a secure messenger service at least twice a week. The tandem approach forces both evaluators to analyse, distil first findings and develop conclusions throughout the evaluation process. Conversely, when you’re on your own, you must keep your impressions to yourself (confidentiality in evaluation!). On lonely evenings in hotels far from home, it can be hard to overcome the fatigue at the end of busy days to study the day’s notes – for a tandem, this routine is much more inspiring. When you evaluate across countries and/or cultures, it makes sense to work in tandems that combine different backgrounds and social identities, so that „insider“ and „outsider“ perceptions and interpretations can challenge each other and lead to stronger findings. „Objectivity“ in evaluation is a lofty goal – a team of two might not attain it, but at least, the inter-subjective setup helps keeping individual bias in check. 

Conversely, when I work as a sole elevator, all I can do is look at my own notes and apply a good dose of self-reflection to question my own findings. I can only be in one place at a time and must juggle interviewing, facilitating group discussions and note-taking. I touch-type while carrying out interviews, a mentally and physically strenuous habit – but a necessary one, because often, resources for transcribing recorded interviews are not part of the evaluation budget. When I write up my conclusions and recommendations, there is no peer to review them. In short, it is a tough, lonely exercise that potentially yields less robust results than an evaluation by a tandem. My clients appear to be very happy with the evaluations I carry out by myself. But even where resources are tight, I recommend setting up tandems – or at least, some peer review process independent from the client and the evaluand – for the evaluation. Even a couple of extra days with a suitable colleague can turbo-charge the robustness of an evaluation’s findings and recommendations.

Small group work – keep it fresh!

A post from my former blog www.developblog.org, which I will take offline soon. —

It is the early afternoon of the second workshop day; the participants are a bit drowsy from a rich lunch; messages have piled up in their smartphones and some people would prefer to deal with those rather than discussing strategy or whatever the workshop is about. Small group work is on the workshop plan. What can you do to keep it lively and productive?

#1 Avoid the classical approach of ushering groups of six to twelve persons into separate rooms („break-out rooms“): They’ll lose at least five minutes on the way there and then again on the way back. To make matters worse, some participants will disappear into the corridors to attend to their smartphones and return when it is too late for productive involvement in group work. Go for buzz groups instead: Everybody stays in the same large room (count some three square metres per participant), set up “world café” style, with participants clustered around round or square tables.

#2 Set rules for the small groups to create an effective thinking environment (see Nancy Kline’s highly commendable book Time to Think). One easy way is to insist on using a talking stick/ball/fluffy toy that every participant must hold at least once and speak, before anyone gets a second turn to speak. It is an excellent way to keep the group from being monopolised by a couple of big talkers. Also, put a clock on the table and have participants limit their verbal interventions to a maximum of three minutes each.

#3 Write each group’s assignment on a big piece of paper that stays with the group. Provide the groups with tools that help them structure their presentation. For instance, if the assignment is to map stakeholders, you can draw one of the common models on a flip chart (e.g. power/interest grid, Lewin’s force field analysis, or concentric circles to designate core/direct/indirect stakeholders, to name but a few options) and ask participants to complete it together. Also, inviting participants to compile “do’s and don’ts” can work well with group work that is about distilling lessons from experience.  

#4 If all groups are supposed to work on the same question, or on questions that converge into a bigger picture, consider using the Institute of Cultural Affairs’ Technology of Participation (ToP). A key feature of this approach is the rapid succession of individual, small group and plenary reflection and visualisation in a way that enables everyone to contribute their thoughts in a safe manner.

#5 For fresh afternoon sessions, avoid heavy (buffet) lunches, make sure there is some daylight in the room, and provide all small groups with plenty of water, coffee/tee and something to nibble on.

#6 Last but not least: Stay engaged as a facilitator! Monitor the groups‘ work, nudge them back to the question and the agreed group process if they stray from it, and be there to answer questions. Never ever dive into your smartphone while facilitating a workshop! Use the break time only.

Classism in evaluation design

This is a favourite post from my former blog, www.developblog.org, which I will soon remove from the web (after 15 years…). The post dates from from 2019 but little has changed since then – only that more people are starting to talk about equitable evaluation, which is good news.

Individual interviews for „important persons“, focus groups for „beneficiaries“, right? Wrong!

These days I have been reviewing evaluations of projects supporting survivors of traumatising human rights violations in countries that are not quite at peace, or even still at war. One would think that in such circumstances, evaluators would be particularly respectful and careful with their interlocutors, avoiding questions and situations that would make them feel uncomfortable, trigger difficult emotions or cause a resurgence of their trauma. In some cases, the opposite is true:

Some evaluators asked people to talk about their traumatising experience in group discussions with five to ten persons – neighbours or strangers, people who were brought together in a one-off two-to-three-hour meeting only because the evaluators needed data from „beneficiaries“. To obtain data from project managers or local officials, the same evaluators tended to prefer individual interviews. I see an implicit message here: People in positions of power deserve more individual attention than simple users of project services. Is that really what we want, when we evaluate projects that are supposed to strengthen people’s confidence and empower them to transform their lives, contribute to change in their societies and make this world a better one?

The problem is not unique to human rights and service-related projects. I have seen evaluations of rural development programmes where „beneficiaries“ were mainly interviewed in groups – for instance, in the convenient setting of an agricultural extension class. It is not only an issue of respect, or lack thereof; it is also a methodological problem. In group interviews, people speak not only to the person who conducts the interview, but also to everybody else who sits in the circle (or around the table). As a result, they are likely to speak in ways and about things they consider acceptable in that group setting (social desirability bias) – not necessarily about their true thoughts and feelings. Focus group discussions are not a good instrument to learn about personal thoughts and experience.

But they can be an excellent instrument for questions that are less personal, for instance, to map actors in a field the participants are familiar with, to learn about local social norms, or to get different experts‘ views on a certain topic. For instance, when a project is about health services, it can make sense to run focus group discussions with health providers: They can explain the situation in their sector, sketch typical processes, discuss together where exactly the project fits in and what contributions it may have made, and so forth. 

I would like to come back to the point of respectful interviews, especially when interviewees are survivors of traumatising violations. I did find one excellent example: The researchers designed questionnaires and interview guides that kept people from digging too deeply into difficult memories. They gave survivors a few days to think before they consented to be interviewed, and offered them the choice of the interview setting – a counselling centre, for instance, or a secluded hotel in a pleasant area. They provided breaks and meals, a couple of nights‘ accommodation if needed, as well as a post-interview check-out with a psychologist – all that to make sure any distress caused by the interview could be dealt with. Coincidentally, the researchers worked in a European country. There is no reason why one shouldn’t work that way in Africa or Asia, is there?

Evaluation in times of COVID-19

This post is part of a series of contributions to my former blog www.developblog.org, hosted by a different platform for a whooping 15 years (since 2008)! I am closing down the old blog and moving some interesting reading to this, new setting. The following post distils lessons from lockdown times.

What does the surge of SARS-CoV-2 (the scientific name of the new coronavirus) infections in parts of Europe mean for international evaluation? Can we, as evaluators, join the soothing voices of those who say, the current common flu epidemic has killed many more people and there is no reason to change anything in our lives? I don’t think so. I would like to remind all of us of the Do No Harm principle: Research ethics require us to carefully weigh the potential benefits of undertaking research (at a given time) against the potential harm associated with it. We can relax about ourselves but we must not endanger others. International evaluations can also be done without international travel.

That is why yesterday, I decided to postpone a case study in an Asian country that has relatively few known coronavirus infections – not because I was worried I would contract the virus, but because I could pass it on to others. I live in Berlin, a city of 3.5 million inhabitants where some 58 cases of SARS-CoV-2 have been detected so far (yesterday’s data). That may seem little. But while I formed my decision, it turned out that a close colleague’s partner who had been in contact with a Covid-19 patient had developed symptoms of Covid-19 (the name of the disease the virus causes). A few hours later, the Guardian (UK) published an article relating how an apparently healthy British couple contracted SARS-CoV-2 during air travel to Vietnam and left a trail of infected people wherever they went – several places spread across Vietnam. 

The health advice published in Germany is to avoid all unnecessary travel. Evaluations are as necessary as ever – yet, most of the time, postponing them would hardly threaten anybody’s existence (apart from evaluators‘ flow of earnings – a risk entrepreneurs are used to). As a matter of fact, many evaluations happen late anyway because of poor planning – see for instance my 2012 post on evaluation planning

Going on as if there was no public health risk associated with a new, rapidly spreading and potentially deadly virus threatens other peoples‘ lives, especially in countries where health systems are in poor shape or already overstretched. Especially when travelling to remote regions, we might carry the virus to populations who, by their relative isolation, could be relatively protected if we stayed away. Remember how UN peace keepers introduced cholera into Haiti? Find here the UN Secretary General’s apology (2016). The history of colonialism is full of examples of European diseases wiping out previously sheltered communities.

What if the evaluation is really urgent, for instance a condition for subsequent project funding (assuming there is no way to re-negotiate the condition in view of a public health crisis)? Work with national evaluators! Even in organisations that find it vital to have an „international“ on their evaluation teams, it is established good practice – even in smaller evaluations – to work with „mixed“ national/international teams. See also my post on „two are better than one“

If you, as someone who commissions an evaluation, feel you must have an international consultant on the team, invite her to work remotely: Where internet connections are good, workshops, group discussions and interviews can be accompanied via Skype, WhatsApp or a more secure video messaging service. Data collected by the national evaluation team can be analysed in regular phone conferences. Time and resources permitting, the national team can have all its activities audio-recorded, transcribed (and translated, if needed) in full, so that the remote evaluator can follow closely what is happening. There are many options, which can also come in handy if we get more serious about reducing the environmental impact of international travel. I have used these options in my evaluation practice and they have yielded good results. 

Remember the old saying about development being all about working „ourselves“ (in the „global North“) out of „our“ business? That applies to international evaluation, too: Let’s strive to ‚localise‘ evaluation while developing a rich flow of knowledge and skills exchange across the world!

Know what you need to know

This is a blog post written in 2020. I have taken it from my old blog, www.developblog.org, which I will close down later this year.

Evaluations often come with terms of reference (TOR) that discourage even the most intrepid evaluator. A frequent issue are long lists of evaluation questions that oscillate between the broadest interrogations – e.g. “what difference has the project made in people’s lives” – to very specific aspects, e.g. “what was the percentage of women participating in training sessions”. Sometimes I wonder whether such TOR actually state what people really want to find out.

I remember the first evaluation I commissioned, back in the last quarter of the 20th century. I asked my colleague how to write TOR. She said, “Just take the TOR from some other project and add questions that you find important”. I picked up the first evaluation TOR I came across, found all the questions interesting and added lots, which I felt showed that I was smart and interested in the project. Then I shared the TOR in our team and others followed suit, asking plenty more interesting questions.

I wonder whether this type of process is still being used. Typically, at the end, you have a long list of “nice to know”-questions that’ll make it very hard to focus on questions that are crucial for the project.  

I know I have written about this before. I can’t stop writing about it. It is very rare that I come across TOR with evaluation questions that appear to describe accurately what people really want and need to find out. 

If, as someone who commissions the evaluation, you are not sure which questions matter most, ask those involved in the project. It is very useful to ask them, anyway, even if you think you know the most important questions. If you need more support, invite the evaluator to review the questions in the inception phase – with you and all other stakeholders in the evaluation – and be open to major modifications.

But please, keep the list of evaluation questions short and clear. Don’t worry about what exactly the evaluator will need to ask or look for to answer your questions. It is the evaluator’s job to develop indicators, questionnaires, interview guides and so forth. She’ll work with you and others to identify or develop appropriate instruments for the specific context of the evaluation. (The case is somewhat different in organisations that attempt to gather a set of data against standardised indicators across many evaluations – but even then, they can be focused and parsimonious to make sure they get high quality information and not just  ticked-off boxes.) 

Even just one or two evaluation questions is a perfectly fine amount. Anything more than ten can get confusing. And put in some time for a proper inception phase when the evaluation specialists will work with you on designing the evaluation. Build in joint reflection loops. You’ll get so much more out of your evaluation.