New guidance issued to help UK government departments evaluate AI’s impact

As government departments increasingly adopt artificial intelligence systems, it’s essential that they can understand the impact of these tools.
With this in mind, the UK’s Evaluation Task Force has published new guidance on assessing the impact of AI tools with regard to outcomes, processes and value for money. It aims to “enhance the safety and confidence with which government departments and agencies can adopt AI technologies, ensuring that public sector innovation keeps pace with the private sector”.
The task force, which is a joint Cabinet Office and Treasury unit, provides support to help ensure that spending decisions are backed by evidence. The guidance has been published as a new annex to the Magenta Book – a central government guide on evaluation in policymaking – and covers best practices for AI evaluation design, methodology and timing.
Illustrative examples provided include evaluating whether an AI system used to assess grant applications has increased the accuracy of rejections, or if a Large Language Model (LLM) application has improved civil servants’ efficiency in producing briefing notes for ministers – and whether quality has been affected.
Read more: New UK digital services plan aims to ‘transform the relationship between citizen and state’
AI requires ‘more substantial evaluation’
The guidance does not address how to evaluate the quality, safety or accuracy of AI tools but focuses on their impact on decisions and outcomes. It was co-produced with the Department for Transport and Frontier Economics, and the Evaluation Task Force said it reflects “an understanding of the unique challenges posed by AI and the need for tailored approaches to address these challenges”.
Due to these unique risks, the guidance says that AI systems are likely to “require more substantial evaluation” than other types of interventions.
In particular, it advocates for the use of Randomised Control Trials when testing a new AI product “to produce high quality evidence on the intended and unintended impacts of introducing these new technologies”. The 54-page annex also includes a series of hypothetical case studies to illustrate approaches to evaluating the impact of different types of AI tools, such as a citizen-facing chatbot, an LLM for civil servants, and an AI-enabled patient support programme. It urges departments to consider impact evaluation for AI tools as early as possible.
Read more: Exclusive research sets out how UK government can capitalise on the opportunities of AI
UK’s AI push
The guidance will now be co-owned with the Government Digital Service, which was recently established as the UK’s ‘digital centre of government’ within the Department for Science, Innovation and Technology.
The publication comes as the UK government ramps up its AI push. This month, it adopted an AI Opportunities Action Plan to boost growth and deliver services more efficiently. This was followed by the launch of a new ‘blueprint for modern digital government’ including a package of AI-powered tools for civil servants – dubbed Humphrey, after the fictional Whitehall official from the BBC drama Yes, Minister.
Among the tools are Consult, for analysing consultation responses; Parlex, which helps policymakers search and analyse decades of debate from the Houses of Parliament; Minute, a secure transcription service; Lex, designed to assist officials with legal research; and Redbox, a generative AI tool that aids civil servants in tasks such as summarising policy and preparing briefings.
Peter Kyle, secretary of state for science, innovation, and technology, said that the government is focused on “overhauling how the public sector uses technology” in order to “slash the time people waste dealing with annoying processes so they can focus on what matters to them”.
Read more: ‘Global race of our lives’: how the UK aims to lead the AI revolution










