The Modern Research Data Portal: A design pattern for networked, data-intensive science

University of Chicago, Chicago, Illinois, United States
Energy Sciences Network, Lawrence Berkeley National Laboratory, Berkeley, California, United States
Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, Illinois, United States
Department of Computer Science, University of Chicago, Chicago, Illinois, United States
DOI
10.7287/peerj.preprints.3194v2
Subject Areas
Computer Networks and Communications, Data Science, Distributed and Parallel Computing, Security and Privacy, World Wide Web and Web Science
Keywords
portal, high-speed network, Globus, science DMZ, data transfer node, science gateway
Copyright
© 2017 Chard et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Chard K, Dart E, Foster I, Shifflett D, Tuecke S, Williams J. 2017. The Modern Research Data Portal: A design pattern for networked, data-intensive science. PeerJ Preprints 5:e3194v2

Abstract

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.

Author Comment

This new version corrects two minor typos.