Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Sustainable computational science: the ReScience initiative

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on October 5th, 2017 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on November 6th, 2017.
The first revision was submitted on November 15th, 2017 and was reviewed by the Academic Editor.
The article was Accepted by the Academic Editor on November 15th, 2017.

Version 0.2 (accepted)

Feng Xia · Nov 15, 2017 · Academic Editor

Accept

I believe that the reviewers' comments have been well addressed.

Download Version 0.2 (PDF) Download author's response letter - submitted Nov 15, 2017

Version 0.1 (original submission)

Feng Xia · Nov 6, 2017 · Academic Editor

Minor Revisions

The reviewers have raised some minor issues that I suggest the authors to address.

Daniel Himmelstein · Oct 25, 2017

Basic reporting

See general comments.

Experimental design

See general comments.

Validity of the findings

See general comments.

Additional comments

This article provides an informative discussion of the ReScience journal and its motivations. Created in 2015, ReScience publishes replication attempts of previous computational analyses. The initiative is noteworthy as a project that aims to improve the replicability of science and produce open source implementations of past closed science. In addition, the initiative was innovative in adopting GitHub as the submission, review, and publication platform. This design leads to unprecedented transparency in the review process and presumably helps encourages productive and actionable exchanges between reviewers and authors.

I do take issue with some aspects of ReScience's design. However, given that this article is a retrospective examination, modifying the past design is not feasible. Therefore, some of my comments should be considered advice for the future or discussion points to consider, rather than obligatory changes.

The following two statements are worrisome in that they seem to take a defeatist attitude towards computational reproducibility:

> In particular, if both authors and reviewers have essential libraries of their community installed on their computers, they may not notice that these libraries are actually dependencies of the submitted code. While solutions to this problem evidently exist (ReScience could, for example, request that authors make their software work on a standard computational environment supplied in the form of a virtual machine), they represent an additional effort to authors and therefore discourage them from submitting replication work to ReScience.

> the code newly produced for ReScience will likely cease to be functional at some point in the future. Therefore, the long-term value of a ReScience publication is not the actual code but the accompanying article. The combination of the original article and the replication article provide a complete and consistent description of the original work, as evidenced by the fact that replication was possible.

It is highly problematic if ReScience replications are not reproducible (i.e. they cannot be rerun by other researchers in the future). First, enhancing a replication will become difficult. Software and data analyses are never fully complete. Contributions to a replication may continue indefinitely. Furthermore, the replications will often contain useful open source implementations that are adaptable to new problems. Reproducibility will make repurposing the code more feasible. Finally, the value of the initiative is diminished if independent parties cannot verify a replication. Therefore I think it's a bit prescriptive and shortsighted to consider software reproducibility as a secondary concern.

While there is not a one-size-fits-all solution to reproducible computational analyses, there are many approaches that help remedy the situation. For example, the shelf life for software that makes no attempt to control its environment is often only a few months. However, even just specifying the versions of the explicit dependencies, say as a conda environment, could increase shelf live to several years. Alternatively, containerizing the entire computational environment, e.g. via Docker, could increase the shelf life to decades. I believe that ReScience should strongly encourage controlled environments and dependency versioning. For example, as a code reviewer, I would in most cases refuse to approve a data analysis codebase without versioned inputs and software.

On a related point, the study does not cite continuous analysis (https://doi.org/10.1038/nbt.3780). Continuous Analysis (controlled environments with automated execution via continuous integration) would potentially be very helpful for ReScience repositories. It would help ensure the reproducibility of analyses and traceability of the results. Continuous Analysis can be quite complex to configure, so this is mostly a suggested reference and consideration going forward.

> It offers unlimited free use to Open Source projects, defined as projects whose contents are accessible to everyone.

When discussing the pricing model of GitHub, I recommend writing "It offers unlimited free use to public projects" rather than stating "to Open Source". It's important not to further the misconception that public repositories without an open license are open source. Currently, GitHub's [pricing page](https://github.com/pricing) states:

> GitHub is free to use for public and open source projects. Work together across unlimited private repositories with a paid plan.

Can ReScience implementations use code from the original study or are authors required to entirely rewrite the code to encourage true replicability?

The study is lacking a data-driven summary of ReScience's past operations. I'd recommend tables and/or figures that express ReSceince's history up to this point. For example, what about a table with all ReScience articles thus far? The table could show the date and info of the original study as well as link to the ReScience replication. Additionally, some quantitative measure of how much total review (in the form of GitHub comments) has occurred would be informative. What's the distribution of codebase sizes, language choices, study lifespans, etcetera, across ReScience articles? In short, this article could do a better job providing quantitative measures of ReScience's historical growth and mechanics.

To close, my favorite quote from the study is:

> The holy grail of computational science is therefore a reproducible replication of reproducible original work.

Cite this review as

Himmelstein DS (2017) Peer Review #1 of "Sustainable computational science: the ReScience initiative (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.142v0.1/reviews/1

David Donoho · Nov 6, 2017

Basic reporting

The abstract names a certain 'James Buckheit'. It should be rather 'Jonathan Buckheit'.

This is a superbly lucid article.

It describes a project that is exemplary and inspiring.

Because of the unusual nature of the article -- it describes a new scientific publishing effort -- the usual PeerJ reviewer checklist is not appropriate.

I have not published at ReScience (But now I intend to!) and so I cannot vouch for the veracity of any of the statements about the procedures. The description is very clear -- I wasn't even able to spot typos, except for the issue mentioned in line one of this review.

Experimental design

The intellectual principles of science are modeled in an exemplary way by this paper.
So as a scientific project this is first-rate. It is not however relevant to judge this by the PeerJ experiment design checklist.

Validity of the findings

This paper is necessarily a list of experiences setting up a new scientific publication process.

I am not able to judge the literal truth of the claims about the journal and its operations, however, the description makes me believe it all.

Perhaps someone among the reviewers actually HAS used ReScience and CAN say that things actually work this way.

Additional comments

BRAVI! HOORAY!

Cite this review as

Donoho D (2017) Peer Review #2 of "Sustainable computational science: the ReScience initiative (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.142v0.1/reviews/2

Download Original Submission (PDF) - submitted Oct 5, 2017

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Sustainable computational science: the ReScience initiative

Summary

Version 0.2 (accepted)

Feng Xia · Nov 15, 2017 · Academic Editor

Version 0.1 (original submission)

Feng Xia · Nov 6, 2017 · Academic Editor

Daniel Himmelstein · Oct 25, 2017

Basic reporting

Experimental design

Validity of the findings

Additional comments

David Donoho · Nov 6, 2017

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
Sustainable computational science: the ReScience initiative