Fixing scientific publishing and peer review

I’m not a significant contributor to the academic scientific literature but I am a big user of it. What I see has me worried, and itchy for the obvious solution. Science isn’t broken, but gatekeeping journals expected to give any kind of stamp of quality are.

After the Surgisphere debacle (see my blog post, the big piece in the Guardian that blew it all sky high a few days later, and fallout from retractions from the world’s top two medical journals), I’ve been pondering scientific publishing and the failings of pre-publication peer review. Then a bunch of other examples emerged, including this retraction from Scientific Reports on mental health impact of gaming in Africa (87% of insomnia variance explained by gaming habits!), and this statistical-model-free piece on the impact of wearing masks in Proceedings of the Natural Academy of Sciences (not retracted yet but I have hopes). And that’s not even a complete list of the bad, published, peer reviewed scientific articles I’ve seen taken up with considerable publicity in the media recently.

One thing is clear - pre-publication peer review says very little about quality. I certainly never want to hear again an article defended on the grounds that “it is in a peer reviewed journal”, or critiqued on the grounds that it isn’t. But review of some sort is essential. It’s just that this isn’t coming from journals. So how to do it better?

Some well known problems

Let’s take a step back. The scientific publishing process has a range of well known problems, the symptoms of which include:

Access to published science is restricted - journal subscriptions are expensive items even for university libraries, and basically not available to others. See for example the critique from The Cost of Knowledge. Sometimes the people who most need this access - whether they are policy advisors in government or parents worried about vaccination - are particularly unlikely to have access and resort to the freely available alternatives on the web.
Researchers resent petty gatekeeping by anonymous reviewers, as can be seen from browsing the #reviewer2 hashtag on Twitter. Sometimes reviewers’ feedback appears in an offensive spirit. Sometimes it requires changes the authors are convinced subtract value (for example, insisting on an outmoded statistical technique because it is familiar to the reviewer; or insisting on referencing the reviewer’s own favourite publications).
Academics find peer review difficult and unrewarding work. Some of them resent giving free labour to profit-seeking firms (see symptom 1). Editors complain that it can be hard to get enough of the right reviewers.
The gatekeeping isn’t effective in assuring readers of research quality. Something as transparently bad as the Surgisphere fraud passed the editorial filter and peer review processes of the Lancet and the New England Journal of Medicine, but the blogosphere blew it apart in days. Defenders of the system suggest peer review can’t pick up such problems. It is implicitly expected that post-publication peer review is required to pick up the flaws (as happened in this case). But then, it is genuinely hard to understand what the imprimatur of “published in a peer review journal” adds. In fact, the authority granted by passing this (apparently not to high) bar makes post-publication peer review harder.
On the other hand, readers from outside of the scientific establishment are now exposed more than previously to “preprint” science. Many people fear that preprints are given too much exposure, given they have not been subjected to peer review. The media and Twitter cycles and the nature of public debate are powerful forces for making this worse, for example by creating amplifiers for misinterpretations or overinterpretations based on single studies.
At the same as all this, we have the replication crisis, the growing realisation that much published science is based on shoddy methods, statistical or otherwise. While great advances have been made in combating this (pre-registration, open code and data, improved statistical methods, improved meta-studies), problems remain with incentives and failures from all the methods above.

Sometimes these problems intersect in surprising ways. For example, John Ionnadis’ 2005 article Why Most Published Research Findings Are False was a seminal publication in the modern understanding of the replication crisis (symptom 6). Yet some of his own preprint publications in the time of COVID-19 have been the target of criticism for feeding the fuel of symptom 5, over-exposure of non-reviewed peer prints.

A solution is already emerging spontaneously

All this may seem bad, but the joint solution to all these problems is emerging fast. Much of it in fact is already in place. Here is what I see as the elements of the solution, in rough order of how it is already developing:

Make the preprint servers or their equivalent the default for publication, but with a more disciplined approach to managing unique document identifiers, versions and retractions.
Strengthen existing post-publication peer review such as that on PubPeer, the question-based, problem-seeking type of review.
Complement this problem-seeking review with a new positive form of post-publication peer review - badges or certificates indicating a stamp of approval from an appropriately certified reviewer. Individual badges would cover off a single dimension of quality. Articles that accrue gold-standard badges on all dimensions would be treated as gold-standard research.
Gradually abandon "pre-publication" peer review. It adds little value, it slows the communication of insights down, and it gives a false sense of security. to the reader. Anyway, there really is no such thing as 'pre-publication' any more, with researchers announcing findings by press release, blogs and preprints.
Let the journals wither away, or re-invent themselves as open-access journals that are essentially collections of articles in the manner of "Peeriodicals" promoted by PubPeer. They can have pre-publication peer review if they want, focused on 'is this important and new enough and does it cite the literature properly', but this will no longer be seen as a badge of quality.

As mentioned, this solution is already emerging spontaneously, with the basics of items i. and ii. in place and growing in strength. The key features have been:

the rapid rise of the preprint servers
improved post-publication peer review through sites like PubPeer
growing calls for open peer review and even adoption of some of its elements in some quarters
growing expectations of open data and (at least) open code, with acting on those expectations made easy by sites like GitHub.

Peer-reviewed Open Access journals are also an important part of the movement towards a solution. And actions like MIT’s recent cessation of negotiations with Elsevier certainly are a big step forward and part of this whole movement. However, I’m arguing that the whole concept of a journal with pre-publication peer review is out-moded or at least of marginal importance. I think open access journals can be seen as very much an interim step rather than likely to be part of an enduring solution. Of course, existing open access journals, and maybe even closed journals, are likely to evolve or re-invent themselves into something useful along the lines of part v. of our solution.

The sort of post-publication peer review that evolves spontaneously is likely to be of the critical, problem-seeking, question-asking type - part ii. of our solution. That sort of question-asking review is essential but not enough for a reader to know quickly and clearly the extent to which a given article is of sufficient quality to be relied upon and in which aspects. Time-poor and expertise-limited readers will continue to need efficient shortcut indicators of research quality.

Because of this, the only new part of my proposed solution is item iii., badges or certificates. I believe these are a necessary complement to the problem-seeking review. We can think of the problem-seeking review as a monitoring system, alert for danger signs. Whereas the positive-vetting provided by a badge or certificate is the result of a pro-active test of quality.

It’s unreasonable to expect researchers, let alone the public, to be sufficiently expert in all pertinent dimensions of quality of every study and to carefully read the questions and answers on highly specific issues on PubPeer. A diverse set of badges will replace the (false) sense of security currently given by publication in a prestige peer-reviewed journals.

This is an opportunity to do the job better. Badges and certificates would allow each specific dimension of quality to be reviewed by experts with relevant expertise. This means that peer review can be done by multiple experts focusing only on the areas in which they have relevant and sufficient expertise. The more badges or certificates a study has achieved, the more research is likely to be robust. The badge for any one dimension could be of “none yet”, bronze, silver or gold status. This also allows research to be recognised for its strengths, even though it may have weaknesses or areas where caution may be needed. Which probably describes the vast majority of research.

The dimensions of quality and how they are bundled would need to be jointly developed by the scientific community and perhaps adjusted for different types research. But I expect they would cover the following:

publication describes and takes into account previous literature in the field
publication describes and takes into account relevant literature in other fields (likely different expertise required to above)
study rationale is well reasoned and compelling, and makes a new contribution to the field, either with new findings or replicating or failing to replicate previous findings
methods are fully described
statistical and/or qualitative methods, and their execution, are sound
any software (ie code) developed for the research works as described
data provenance is verified
results can be reproduced with the original data (data transparency necessary for both quantitative and qualitative research)
writing is clear and concise
data visualisations or tabulations are relevant, sufficient and follow agreed good practice
conclusions follow from the analysis
results have been replicated with new data in subsequent studies (this would accumulate over time).

And so on. As soon we set out a list like this it is obvious that different experts are needed for most of these dimensions, and how impossible it is for existing pre-publication review to check off all of these in a timely fashion with only two or three reviewers. In fact, it seems that existing pre-publication peer review is limited to only a few of these dimensions in any one instance, with the issues covered depending to a significant degree on the luck of the draw of the expertise and interest of the particular reviewer.

As a positive example of how even the “badge” approach to post-publication positive peer review is emerging, consider for example the case of the big microsimulation study of the impact of non-pharmaceutical measures on pandemic spread that informed the UK Government’s response to COVID-19. Many people were horrified when lead author Neil Ferguson tweeted that code involved “thousands of lines of undocumented C” from 13 years previously. Microsoft and others provided assistance to review and refactor the code. Eventually it was shown it was up to scratch, as seen in this thread by John Carmack:

“I can’t vouch for the actual algorithms, but the software engineering seems fine.”

I treat that as a definite badge of quality on one dimension (“f.”); that’s one badge down. The debate can continue (or perhaps it is resolved, I don’t know and it doesn’t matter) on the assumptions, theory and place in the literature of the study, but at least we know the software works. No single person could possibly review this sort of work against all the dimensions of quality we’ve identified. It simply makes sense to have different experts assess different aspects.

Resourcing

There’s clearly an additional problem which is the incentives for scientists to engage in this sort of peer review. This is one of the problems I initially identified, and I don’t claim to have a clear solution for it. I do argue that what I’m proposing doesn’t make it any worse. However, for the post-publication peer review system (or any peer review system, including the current one) to work and retain the confidence of the scientific community and the public, something needs to be done to resource it.

Two aspects to consider in looking for a solution to resourcing include:

can the career progression of scientists be modified to take into account their contributions to reviewing others research?
can institutions employing scientists be required to set aside resource (in the form of dedicated time) for reviewing - perhaps with the savings from their subscriptions to commercial journals?

Relatedly is a problem of who is to be allowed to assign badges. I can imagine various solutions for this but it’s not clear what would work best. So I’ll leave that issue and resourcing for others who know the details of the scientific institutions better than us.

Conclusion

I’ve set out an approach to fixing some very visible problems in scientific research. Like Jon Brew and Carlos Chaccour, the authors of Let’s learn from this debacle. Academic publishing is broken, I think the solution involves removing the gate-keeping of old fashioned journals and emphasising post-publication peer review. Trends in this direction are happening spontaneously. I add to Brew and Chaccour’s idea the notion of badges or certificates for specific aspects of scientific quality.

This post was written with editing and ideas assistance from Kay Fisher

Fixing scientific publishing and peer review

At a glance:

Some well known problems

A solution is already emerging spontaneously

Resourcing

Conclusion