Evidence in action: how US government agencies measure their performance

The Biden administration has launched a year-long campaign to ensure that federal agencies are making best use of evidence in policymaking. Global Government Forum brought together experts from across federal government to identify best practice
Government agencies in the United States have a legal duty to ensure they develop and use performance information and other evidence in decision-making.
The provision, set out in the 2018 Foundations for Evidence-Based Policymaking Act, calls on agencies to take a number of steps to improve their use of evidence in the policymaking – including modernising data management practices, building evidence building functions, and improving statistical efficiency.
The legislation was the result of the U.S. Commission on Evidence-Based Policymaking’s work to improve the use of evidence in government, but it is not the only action taken by the US federal government. In April, the Biden-Harris administration launched a ‘Year of Evidence for Action’ that is focused on mobilising research-based evidence in policymaking, focused on three priorities:
- Sharing leading practices from federal agencies to generate and use research-backed knowledge to advance better, more equitable outcomes for all of America;
- Strengthening and developing new strategies and structures to promote consistent evidence-based decision-making inside the federal government; and
- Increasing connections and collaboration among researchers, knowledge producers and decision makers inside and outside of the federal government.
To investigate best practice in evidence use, and the barriers holding back such approaches, Global Government Forum invited a group of officials from across the US federal government together to share insight on the how the federal government can make sure it learns the lessons it needs to make life healthier, safer, more equitable, and more prosperous for the American public.
At the heart of the evidence act is the intention to get government “to better integrate evidence, data and information to inform decision-making”, said Annie Chiang, deputy evaluation officer and organisational performance lead at the Department of Treasury. “A lot of it is to ensure that the decisions we’re making – which will affect lots and lots of people – are sound, as best as we can, using information that’s out there.”

Under the act, agencies are required to assess their evidence building capacity, and set out, as Chiang put it, “what they want to learn” with a four year-learning agenda to boost skills in organisations, and a plan to deliver on them.
Jason R. Bossie, the director of program performance, analysis and evaluation at the U.S. Small Business Administration, shared details of the agency’s work to combine both performance management and evaluation to boost decision-making.
The key in evaluation is both “building the evidence and connecting it back to the priorities, and performance management”, he said.
One example he highlighted was the agency’s disaster assistance programme to help small businesses access funds to mitigate the effects of climate change. “We have a component that allows businesses to take out a portion of loan to help with mitigation activity – so if they wanted to elevate their business, to prevent future flooding, for example.”
The agency set a goal in its 2022-2026 strategic plan to increase the number of loans that include mitigation measures by 20%. To drive that forward, the agency undertook research and evaluation of its existing work to set targets. This looked at how many borrowers took out a mitigation loan, what the rates were in different parts of the country, and the different characteristics of those who did.
“That evidence helped us to build the baseline for the development of that performance measure and essentially the agency priority goal that stemmed from it. We were able to set milestones for what the programme managers within that area were going to complete, as well as other activities that they were going to track,” Bossie said
The disaster assistance goal is also linked to research and evaluation questions as part of the agency’s learning agenda and evaluation plan, both of which stem from the Evidence Act.

So the agency’s loan officers are being trained to raise awareness for borrowers on what is available, and Bossie said the agency will be able to test the quality of training programmes by assessing whether loan officers understand mitigation lending. “We’ll be able to test whether borrowers are taking up more of those loans as a result because we track quarterly progress through the agency priority goal. We will publish that information on performance.gov, and we also hold other sessions internally to track progress and accountability.”
This allows for evaluation to be developed across the policy lifecycle, with senior leadership involved in those quarterly meetings.
“So it’s cyclical,” Bossie said. “We’ve been able to build an evidence base, connect that back to our organisational goals… [and] link together performance and evaluation through various mechanisms, including the strategic plan, the enterprise learning agenda, and our quarterly performance reviews. And we’ve been able to test these results, and we’ll see that the feedback from evidence building so that we’re able to adjust our goals, going forward, to help more small businesses access these resources.”
The importance of actually measuring outcomes – not just actions
Kathleen Nestor, the managing director of civilian practice at SAS, the webinar’s knowledge partner, highlighted the progress that had been made since the Evidence-based Policymaking Act was signed in 2019, with around 90 chief data officers in post across US government agencies, and over 347,000 public datasets published centrally on the federal government’s data.gov website.
While this is good progress, Nestor said government “still ha[s] a way to go” to embed evidence based policymaking and ensure that government is “actually measuring outcomes, rather than just well-intentioned actions”.
“I definitely see a lot of agencies collecting data and measuring performance. But are the actions that we’re taking really helping us be more effective and efficient? Are we providing evidence that what we are doing is actually working?”
Nestor gave one example of an unnamed agency’s strategic plan for diversity, equity and inclusion, which had very clear definitions of the challenges and the objectives they wanted to achieve, including being able to “effectively recruit qualified individuals at all levels with diverse backgrounds”, and “cultivate an inclusive workplace”.

However, the metrics identified to measure success included monitoring the amount of diversity and inclusion related training provided to employees, which, Nestor argued, might not lead to a more inclusive workplace.
“So maybe instead of ‘x number of trainings’, we want to look at its correlation with the number of discrimination complaints filed each year, or maybe the number of individuals hired with diverse backgrounds who remained at that agency for over a year,” she said. Agencies could also use technology such as natural language processing to mine employee survey comments to monitor whether overall employee sentiment around the topic changed statistically after training.
Nestor also said departments and agencies needed to use several metrics to measure success. “If we’re only looking at one set of actions and their effects and isolation, we still may not actually be investing in the things that are significantly moving the needle. Just because two things seem to be correlated doesn’t necessarily mean that they’re actually causing the other thing.
“I think it’s important to make sure that we’re investing our money where it’s actually resulting in the outcomes that we intend, and, at the least cost to taxpayers as well.”
Finding that optimum balance is an iterative process, and government also needs to think about what useful data it should be collecting. “I’ve seen agencies collecting a lot more data, which is great [but] I think it’s also important to collect and store the right data. The goal is to leverage it as a strategic asset, rather than something that just spikes [data] storage costs, especially as we move to the cloud.”
This is where chief data officers have important roles in “ensuring that all of the data that’s being used in these agencies is very well governed and accounted for and that agencies know what inventory they have and how it’s being used to reduce those duplicative efforts”, she said.
Also speaking at the event, Winston J. Allen, the agency evaluation officer at the US Agency for International Development (USAID), shared details of the agency’s use of performance evidence in decision-making.
USAID works in over 80 countries, and is part of international development efforts in sectors including agriculture, education, and democracy and governance.
Allen highlighted that its monitoring, evaluation, and learning is decentralised across the on-the-ground missions in the countries it works in, but that a central Office of Learning, Evaluation and Research is focused on helping to build up in-country capacity for monitoring and sharing best practice.
The agency undertakes two main reviews of its activities, he said – performance monitoring, which involves what Allen called “ongoing and systematic collection of performance data to oversee implementation and understand progress toward measurable results”, and performance evaluations, which are designed to “identify accomplishments, performance issues, and constraints on activity implementation” either half way through projects, or at other milestone stages.

The performance monitoring is done by each of USAID’s international missions, he said. “We do have several standard performance indicators that missions do use, but they are also allowed to use custom indicators, as well as other contextual indicators we can monitor as part of the performance monitoring.”
The data is used to inform judgments about the outputs and outcomes and to “inform decisions about current and future programmes”, Allen said.
“The performance is analysed by comparing actual results achieved against the expected results and targets initially set at the beginning of the project or activity.
“The mission sets the target in terms of the outputs, even the outcomes, and these are monitored in terms of the data that’s collected to see the extent to which the actual results match the targeted results.”
The performance evaluations often come half way through USAID’s five-year programmes, and the results can then be used to tweak programmes and stay on top of progress.
Christine Heflin, evaluation officer, Department of Commerce, said the department – which incorporates a host of different organisations including the Census Bureau, Bureau of Economic Analysis, the National Oceanic and Atmospheric Administration, the US Patent Office, and the National Technical Information Service – had worked to pull together its evaluation expertise in response to the Evidence Act.
“Collectively, we have the expertise of maybe the best consulting firm in the nation, but [when we’re] working in our stovepipes it’s not quite as good. So we’ve been pooling our know-how much more.”
Such an approach was needed as “good rigorous evidence can take years to develop”, so more collective work was needed on how evidence could be developed for decisions that need to be taken in real time, such as economic support.
“Our solution, which I will admit right now is still more aspirational than [being] there, is to use our own statistical and administrative information better,” Heflin said.
For example, the department is working to monitor the impact of community intervention work using a wider range of more granular data, including the Census Bureau’s Opportunity Atlas, county health rankings from the University of Wisconsin, and the national economic resilience data estimator from the U.S. Department of Energy Office of Science’s Argonne National Laboratory.
“All of them give us some idea about different parts of the country to fairly granular level – where are they at right now in terms of economic resilience, opportunity, access to healthcare – and there are social vulnerability indexes, and also economic and environmental justice indexes.”

The most promising index for the purposes for evaluation is the Community Resilience Estimate For Equity, for the Census Bureau and the Economic Development Administration (EDA).
The key point of this data set is that it proves data down to the level of what are known as the ‘census tracts’ – small, relatively permanent statistical subdivisions of a county that can be used for ongoing comparisons. Heflin said this is unusual for economic development data.
“We need data that shows: One, where is that county before the intervention? And two, where will it be after the intervention? And as instruments like that are refined our ability to track the data – where were they at the tract level, before the intervention, and where were they at the tract level after the intervention – proposes a way of looking at progress at impact, without having a big burden, and getting the information perhaps a bit more real time than we have in the past.”
Even if the information were annually updated, this would be an improvement, she said.
The “fly in the ointment”, as Heflin put it, is assessing cause and effect as Nestor previously highlighted.
“I can see if a census tract is progressing, Can I necessarily attribute it to the commerce intervention? The answer is not necessarily.”
However, there is still scope for a great advance, she said, as the department is working to get the federal government overall to record in its programme information what census tracts it expects to benefit from policy interventions.
“With that, I could ideally go to map, click on a census tract, see the investment there at the federal level, and then also see the data from some indicators on how that area is doing,” Heflin said. “Again, that’s the aspiration – we’re working on it – but the ability to solve the problem, both of having more current information, and information that does not put a big burden on the community, that potential is great.”
This approach is being considered for the US government’s flagship Infrastructure Investment and Jobs Act, she said.
How to get the evidence to decision makers
Getting evidence into decision-making was one of the challenges highlighted in the session.
Chiang said that she quickly learned in Treasury that “no matter how good your evidence and research is, if there is not a good understanding of the decision-making models, and getting the right information into the right meetings, then the evidence won’t inform better decision-making”.
Treasury is working on how to measure “organisational health”, including basing its performance assessments on quarterly themes to keep monitoring organisational health.
These are broken down as: using data to set budget priorities in the summer; examining organisational performance, resiliency and capacity in the fall to assess if they have the right resources for the fiscal year ahead; examining the overall strategic plan in the winter and checking “we have the right plan to ensure that we’re getting information the right places”, and examining any blockages that emerge from this exercise in the spring.
“All along those four seasons, evidence is a very big portion of it. And I think having the right setup to funnel that information is critical to our success.”
Also on organisational development, Allen highlighted that USAID has a registry that monitors how its performance evaluations themselves are used.
Of a sample of 129 performance evaluations, over half were used to further inform project or activity design, and a similar proportion were used to inform management decision-making. The analysis also found that more than 40% were used to provide a better understanding of programme, project or policy. Around one in six (16.3%) were also used as an opportunity to better engage stakeholders.
The Global Government Forum webinar ‘What works: how US federal government agencies can evaluate their performance’ was held on 9 July 2022, with the support of knowledge partner SAS. You can watch the 75-minute webinar, including question and answers, via our dedicated event page.