Beyond the numbers: Growing value in NZ science

New Zealand's research reforms promise growth, but will its metrics hit the target or miss the point?

Sep 03, 2025

Second post in one week! I won’t make a habit of it, but wanted to share this one while the news is still hot off the press.

Aotearoa New Zealand’s research and innovation landscape is undergoing a significant transformation. The government has consolidated its seven Crown Research Institutes into new Public Research Organisations (PROs) focused on bioeconomy science, earth science, public health and forensic science, and advanced technology respectively. The aim is to build a more commercially driven and integrated science system that directly supports national economic growth.1 These reforms are backed by significant funding cuts to fundamental investigator-led science, with many programs facing tighter budgets and some existing research initiatives being wound down.

Alongside these institutional changes, the way research quality is assessed and funded is also shifting: the peer-reviewed Performance-Based Research Fund (PBRF) system, which values holistic, expert judgement of research portfolios, is on the way out, to be replaced by a proposed Tertiary Research Excellence Fund (TREF), relying more heavily on metrics like citation counts and other quantitative indicators.2

These simultaneous reforms reflect an ambition for research to have clearer, more immediate, measurable outputs and economic impacts. These reforms also raise important questions about what gets valued, recognised, and supported in NZ’s science and innovation future.

The good and the not-so-good

The move to PROs in NZ is being financed through significant funding cuts to non-commercial, investigator-led science, notably the Marsden Fund and the Health Research Council’s baseline budgets. These cuts directly threaten pipelines of fundamental research, especially in fields outside immediate commercial or economic impact such as social science, humanities, and basic discovery science. Researchers and sector leaders have strongly criticised these reductions, warning that focusing resources almost exclusively on economic growth risks eroding NZ’s international reputation for research excellence and undermining long-term capacity for groundbreaking, foundational science.

That’s not to say everything should stay the same. Decades of favouring compliance, efficiency, and short-term performance have fostered a research funding system dominated by bureaucracy, risk aversion, and a proliferation of small, short-lived grants instead of sustained investment in bold, innovative research. This approach has constrained the ability of researchers to pursue high value, long term projects. Reversing this culture is vital, in my view, for enabling the kind of transformative research the country needs.

The reforms are a real opportunity to harness science and research as engines of economic growth, as highlighted in this NZ Institute for Economic Research (NZIER) report. While I don’t think economic growth should be the only target for our national science effort, that debate doesn’t diminish the significance of the economic opportunity.3 I just hope that non-commercial impact areas like health, sustainability, Māori and Pacific aspirations also remain priorities. I note that the “science pillar” policy aims to prioritise user-facing research, not just commercialisation but also rapid translation, entrepreneurship, workforce upskilling, and partnership with iwi.

But my main focus in this post is the system for assessing and funding research.

Out with the old, in with the new

The government has officially confirmed the phase-out of PBRF in favour of TREF by 2028, moving from holistic, judgement-based peer review to an automated, indicator-driven funding allocation system.4 TREF will be built around three primary components: field-weighted citation metrics to replace PBRF Quality Evaluation, together with external research income and research degree completions which were also part of the PBRF model. There will also be some secondary metrics to capture research impact (I can’t wait to see how it will do this accurately and meaningfully), non-citation research, and research by wānanga, Institutes of Technology and Polytechnics, and Private Training Establishments. Final design and secondary metrics will be determined after sector consultation in 2026 - so, while the reforms are locked in, there’s still time to advocate for good design and implementation.

To illustrate what this means in practice:

A university researcher used to preparing portfolios of their work for peer review - showcasing a discipline-appropriate mix of publications, creative outputs, teaching, and engagement - will instead be rated according to a few numbers on a dashboard, like how often their articles are cited or the volume of externally-funded projects they attract. For those working on long-term, fundamental questions, such as pure mathematics, indigenous languages, or social history, whose impact isn’t easily captured by these metrics, it could become harder to secure support.
At the institutional level, geologists who once worked within a standalone Crown Research Institute might now be part of the new, larger Earth Science PRO, collaborating with a wider range of fields but oriented more toward commercial priorities and national economic gains. PROs will compete for funding on the basis of citation metrics. Investigator-led funding will face reduced budgets and more competitive grant rounds, making fields that are less aligned with government priorities especially vulnerable.

The government’s rationale for shifting from PBRF to TREF centres on reducing compliance costs (and indeed this move will reduce the burdens placed on researchers and institutions by complex, portfolio-based peer review processes) and achieving greater system responsiveness, meaning quicker adaptation to changing research priorities and more automated funding decisions. However, the challenge of balancing the efficiency of metrics against the depth and probative value of qualitative assessment (e.g., holistic peer review, discipline appropriateness, broader unmeasured impacts) is a live, unresolved issue within NZ’s research community. Reforms to governance, quality assurance, and student mobility are also expected over the coming year.

I believe that moving NZ’s research funding away from the current PBRF Quality Evaluation, which relies largely on peer review and submitted portfolios, to the TREF, using field-weighted citation metrics, is a backward step. Especially at at time when globally, conversations are moving beyond quantitative metrics to reflect broader values.

Why metrics alone miss the mark

If the goal is to nurture world-class research with relevance, depth, and real-world impact, metrics alone will not suffice. Let me offer just three reasons:

Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” When citation counts become the yardstick, behaviours warp to meet the measure. Researchers may chase what’s countable (articles, citations), not what counts (discovery, depth, social change). Researchers and whole fields risk marginalisation simply because their outputs aren’t well represented in citation metrics (or even, perhaps, because others are more adept at gaming the system).

Not everything that counts can be counted: The richness of research – especially in Aotearoa, where mātauranga Māori, community development, and meaningful impact often emerge outside peer-reviewed journals – is inherently resistant to reductive metrics. Like valuing a training course by the number of people who attend, the real value - impacts that people find important and worthwhile - gets missed. Metrics like field-weighted citation impact (FWCI) have at least some face validity, but collapse when wielded as blunt instruments on individual researchers or teams, undervaluing nuanced, boundary-breaking, local, and equity-building research.5

Objectivity is a pipe dream: Performance indicators aren’t objective or neutral. They’re value-laden - influenced by values and beliefs that affect the design, content, and implementation of the measurement system. Indicators reflect choices, contestable priorities, and cultural values. What gets measured, how it gets measured, analysed, reported, and what gets left out, are political. The pipe dream of objectivity is alluring but doomed - the real choice is between making values explicit or hiding them under a rug.

This is not to say “throw out all measurement”. It’s a call not to rely on measurement alone. Peer review, for all its faults, at least recognises the role of human expertise, experience, deliberation, and judgement. It encourages a mixed-evidence approach – portfolios, impact, engagement, and rigorous debate. The proposed shift to field-weighted citation metrics may sweep values under the rug, but the values are still there, whether you like it or not. Throwing sunlight on values promotes better-formed and better-informed decisions.

To support good resource allocation decisions, we need a blend of metrics, qualitative information, human judgement, and transparent processes to bring them all together. In other words, we need the science of evaluation. We also need the not-at-all dismal science of economics, to ensure funding systems are designed to incentivise valuable research rather than indicator-chasing.

Towards richer, more reflexive assessment

What I see missing in the current reform conversation is a commitment to evaluative reasoning - a transparent system for judging quality, relevance and impact. Metrics provide useful signals and qualitative insights add context, but sound assessment requires explicit criteria (what matters and why), standards for performance, and a consistent process for applying them to evidence.

Crucially, this requires human deliberation - bringing expertise and context to bear openly, not abdicating responsibility by outsourcing decisions to algorithms or simplistic rankings. Transparent processes with structured frameworks and mixed methods allow for robust debate, wise resource allocation, and trust in the system. Blending metrics, qualitative information, and clear decision-making frameworks is the foundation of good evaluation, ensuring investments in science create real value for people. Maybe some researchers and institutions found the PBRF burdensome - but good evaluation systems don’t need to be. That’s about embedding the right skill sets and mind sets, not simply installing set-and-forget metrics.

Our social fabric comprises multiple values, and bringing that diversity into decision-making processes is a must. For example, good decisions must recognise and support Te Tiriti commitments, mātauranga Māori, and Pacific research aspirations - not simply as quotas or statistics, but as integral to what counts as quality and impact, as a core part of NZ’s global reputation, national identity and capability.

And while we’re at it, why not evaluate the research system reforms and learn what it would take to make it actually “drive cutting-edge research and innovation that creates real value” and “strengthen collaboration between universities, government, industry and global experts” (as the Minister put it).

Bottom line

Prioritising what can be measured and counted over disciplined consideration of what matters risks marginalising vital forms of knowledge that make Aotearoa’s research system distinctive. Robust evaluation systems can and should blend relevant metrics with qualitative evidence, peer review, and deliberation.

Evaluators have known this for over 60 years. Why are we still having this conversation?

Thanks for reading!

I’d like to thank Daniel Milosavljevic for helpful peer review. Errors and omissions are all mine. These views are my own, reflecting where my thinking sits right now, and not the positions of any organisations or colleagues I work with. Moreover, I am willing to shift my perspective when new evidence or ideas emerge. So, consider this post a snapshot of my current reasoning, not a fixed position. If presented with new arguments or compelling data, I reserve the right to revise my views. That’s simply how science and policy analysis work.

Addendum

Many thanks to Fred Carden for reminding me that RQ+ (Research Quality Plus) deserves a mention here. A holistic framework developed by the International Development Research Centre (IDRC) to assess the quality of scientific research, RQ+ is increasingly referenced in evaluation of international development research. Its flexible, context-sensitive approach has potential relevance to NZ's ongoing sector reforms and debates about improving research quality and impact. RQ+ evaluates research quality through three main dimensions: consideration of contextual factors influencing research; assessment across multiple quality dimensions (including scientific rigour, legitimacy, and utility); and use of evidence-based rubrics to rate and compare research outputs.

The Science System Advisory Group (SSAG) report (August 2024), a foundational analysis guiding NZ’s science and innovation reforms, is available here.

For example, the University Advisory Group (UAG) Final Report (April 2025) argued that performance-based research funding for universities should become a metrics-only system, and these metrics should explicitly include institutional-level field-weighted citation impact (FWCI) in addition to research degree completions and research income. The report recommended eliminating individual portfolio (peer review) assessment in the PBRF and moving to an allocation formula based only on readily available quantitative indicators. It acknowledges the limitations of citation metrics at the individual level but argues that at the institutional level, with proper field weighting and use of broad databases, these concerns should largely wash out and that shortcomings are outweighed overall by the efficiency and “objectivity” purportedly gained at the institutional level.

NZIER’s report (September 3, 2025), noted that investment in science and research tends to pay off for NZ, with historic annual returns cited at 17% for agricultural research and development (R&D), and comparable gains elsewhere in the science system. These benefits are both direct and indirect - boosting innovation, skills, graduate income, and wider productivity - often lasting for decades and multiplying through commercialisation and collaboration with industry. Economic modelling and international comparisons outlined in the report suggest that research spending makes an outsized contribution to GDP.

Here’s the Cabinet Paper.

Field-weighted citation impact (FWCI) should not be mistaken for a measure of substantive research quality or broader societal value. It does not account for factors such as novel or interdisciplinary work, local relevance, or impact outside academia. FWCI is unavailable for alternative forms of scholarly communication that aren’t indexed in major citation databases. It is heavily influenced by field-specific citation practices, which can disadvantage some disciplines like the humanities and creative arts, where research output is less likely to be visible in citation databases. It can be skewed by time lags, outliers and citation clusters. If FWCI were treated as an objective indicator of value, it could distort decision-making by marginalising innovative, cross-disciplinary, or locally impactful research that doesn’t accumulate citations in conventional ways. Relying too heavily on FWCI would risk privileging easily countable outputs and disadvantaging fields of research where contributions are rich but less visible within international citation networks.

Evaluation and Value for Investment

Discussion about this post