Measuring up: how robust are productivity expectations when it comes to AI in the public sector?
Artificial intelligence has the potential to improve productivity across the economy. But how do you measure it – particularly in the public sector where ‘public value’ is both crucial and difficult to define? Here, Imogen Parker of the Ada Lovelace Institute sets out the research into AI’s impact on productivity, and how to make more accurate predictions
One of the major hopes for AI is that it will improve economic productivity. For the UK – which has faced stubbornly low productivity figures for at least the last two decades – this is a high priority.
In this year’s Mais Lecture, the UK’s chancellor Rachel Reeves described AI as the defining technology of our era, a “general-purpose technology with the potential to transform productivity across the entire economy”.
That optimism is shaping major spending decisions when it comes to the state and public services: the Spending Review 2025 committed to “a step change in investment in digital and AI across public services”, informed by government estimates of potential savings and productivity benefits which run as high as £45bn (US$61bn) per year.
Public service reform has become a political priority – services like healthcare and education consistently rank amongst voters’ top concerns. The public sector also matters to the national economy, as it accounts for around 20% of GDP.
High expectations of productivity savings are driving adoption across the breadth of the public sector. But existing figures and estimates have come under scrutiny, in terms of the underlying methods and assumptions.
Our on-the-ground research at the Ada Lovelace Institute has found examples providing some cause for optimism on AI and productivity. Although small-scale, our research with social workers indicated that transcription tools could support efficiencies, which might then lead to productivity gains. Importantly, this wasn’t just about speed of producing a transcript, it was about potential improvement in the quality of care and service. However, we’ve also found instances where adoption introduces unanticipated costs or risks, highlighting that adoption is “lumpy” and not “plug-and-play”.
Our extensive research into AI adoption in the public sector highlights the challenges of successfully and quickly integrating AI into existing cultures, processes and legacy technologies, shaped by professional and public trust. And our study on transcription also highlighted how nascent and diverse professional practice was when it came to using AI. For example, we found significant differences in the time spent checking transcripts, and how any time savings might accrue back to the profession.
Given how expectations of AI’s positive impact on productivity are driving spending decisions, Ada Lovelace Institute researchers Sumedha Deshmukh and Matt Davies undertook a deeper dive into what we know and don’t know about potential productivity savings from AI in the public sector for our new report.
But first, what does productivity really mean?
In day-to-day conversations, improving productivity seems both obvious and desirable. It’s intuitive for politicians and the public to care if there is an unproductive health system, for example.
When used by economists, the term has a more precise meaning. It is a shorthand for how efficiently the economy uses inputs such as labour costs and other resources to produce outputs like goods and services.
In the private sector, it’s more straightforward to add up all the costs on one side and weigh them against the revenues. Productivity looks at the ratio between the two numbers: a company that spends £100m producing umbrellas and selling them for £200m is less productive than one that achieves the same revenue for just £50m.
When it comes to the public sector, that calculation gets trickier. The input costs exist in terms of salaries, buildings, technology and so on, but the output goods and services are often provided free or at a subsidised price.
You could say the UK spends £30bn on education and therefore society receives £30bn worth of outputs, but as Ed Humpherson, director general of the UK’s Office for Statistics Regulation, points out, that doesn’t really tell you anything about the quality, and therefore public value, of our education system.
The Office for National Statistics has taken a more sophisticated approach, estimating public sector productivity by looking at outputs related to the quantity of something delivered, such as knee operations per year, as well as the quality, in this example patient outcomes. It makes exploring productivity more complex, but that complexity is essential: what society values about education isn’t just the number of hours taught, it’s what that teaching achieves that matters.
Thinking about real public sector use cases demonstrates how AI doesn’t have a simple relationship with public sector productivity. To return to our AI transcription research, social workers felt that the speed and quality of their professional practice could be improved by AI. If they spent less time on paperwork, they could have more time to invest in their relationships and practice.
However, some managers implied that future time savings might be “spent” by upping caseloads instead. There still might be a productivity gain, but if it led to greater stress among social workers, rising numbers leaving the profession or less time spent ensuring the quality and safety of their practice, it could result in diminished productivity. So whether AI boosts productivity will be affected by professional decisions about how it is used, alongside potential new costs, processes, or even professional roles that might be required to make adoption successful.
How are studies approaching the question of productivity in the public sector?
As might be expected given the novelty of AI, evidence of at-scale implementations of AI in the public sector is thin. The current evidence base is mixed, with some gains, some losses and a wide range of estimates – reflecting differences in assumptions, methodologies and the specific type of AI being looked at.
Digging deeper into these studies, our research highlights some important findings.
- There’s a gulf between the measurement of time-savings or cost reductions and clarity about how these translate into the outcomes that matter for public value, such as better services, greater equity, improved citizen experiences, enhanced institutional capacity, or improved worker wellbeing.
- Productivity figures rarely consider costs and trade-offs. Lifetime costs of technology are rarely factored in, and there is considerable uncertainty about what future costs may be.
- Methodological weaknesses are being systemised. For example, some productivity research relies on people estimating how much time they have saved by using AI (without even offering an option for tasks taking longer), despite the well-established evidence that people are very poor at self-reporting their time use. Other research uses task databases, which break jobs into groups of specific tasks, and rely on AI to assess what roles can be delivered by AI, often without input from workers themselves.
- The presentation of the figures doesn’t always accurately reflect the nuance. The broad range of estimates is rarely reported and positive figures get the headlines over neutral or contradictory studies.
- The AI industry shapes much of what is researched and how these results are framed. Too often these companies are effectively mark(et)ing their own homework.
- Studies rarely translate theoretical findings about the potential benefits of AI into the evaluation of AI in real-world settings. Time-bound studies like pilots are rarely followed-up, so there is a lack of data on longer-term impacts.
How can we improve predictions about AI’s impact on productivity?
Given the importance of these types of studies in making decisions about AI investment across core services, it’s welcome that the government is investing in new programmes of work, ranging from experimenting with new methodologies to establishing a new AI Economics Institute, modelled after the AI Security Institute.
Our research highlights four ways to improve the generation and communication of evidence:
- Reflect uncertainty. The highest-quality research on AI’s impacts is likely to have a high-degree of uncertainty given that it is rapidly evolving. It’s hard to predict with precision either of the numbers in the equation. So, AI productivity research should acknowledge this uncertainty upfront. That means reporting ranges and variation, rather than headline figures, and highlighting studies which offer contracting estimations.
- Measure what matters, not just what is easy to quantify. That means tracking service quality, error rates, user satisfaction, equity of access and institutional capacity. It also means thinking about distributional impacts across society, and running studies over longer-time horizons.
- Embrace methodological pluralism. Self-reported time estimates should be checked against observed productivity and task automation projections should be validated in real settings and reviewed by workers.
- Acknowledge context-specificity. AI’s impacts and effects are highly context dependent. The same technology could deliver time savings in one organisation but create additional burdens in another, depending on workforce, data infrastructure, organisational culture, staff capabilities and implementation approaches.
Our research is not the definitive guide to AI productivity studies. It is not intended to single out specific studies or approaches for critique, nor endorse others. But we hope it can provide a starting point for asking better questions about the numbers being presented, and what they mean for important decisions.
The current time is characterised by enormous pressure to demonstrate AI’s value, significant financial incentives for producing optimistic projections and limited accountability for whether promises ultimately come true.
Amidst this push, it is important to not lose sight of the public value that purported productivity gains are supposed to contribute to – and, by extension, whether current approaches to measurement do justice to this mission.
Read more: UK closes AI pilots amid ‘strategic changes’ to prioritise legacy tech overhaul












