The Anatomy of a Data Availability Statement (DAS)
What is a Data Availability Statement (DAS)?
A data availability statement (DAS) is an individual section of a scientific article offset from the main body of text that explains if or how another individual can access a study’s research data. Including a DAS in a manuscript helps confirm a study, promotes stronger research transparency, and ultimately improves trust in science. While not required by all journals or funders, the DAS improves the manuscript quality and supports the citability of the data.
A DAS is not only for open, accessible data, though. There are circumstances in which data availability is neither feasible nor responsible, such as in the case of protecting human subjects or other personally identifiable information. But in a case such as that, a DAS can be used to further explain why the author(s) decided to limit their data availability.
Where does the DAS go?
The DAS should be clearly stated in either the beginning or end of the manuscript sectioned with a heading such as, “Availability of Data and Materials,” or simply, “Data Availability Statement.” The section will be distinct from other supplementary materials, so the title of the section must explicitly mention data.
What kind of data must be described in a DAS?
Your DAS references any of the research data needed to replicate or reuse the work. This includes, but is not limited to the following forms: Data you collected, data you downloaded and analyzed (but did not manipulate), and data you generated.
How to write a DAS
The length and wording of the DAS will vary depending on a number of factors, but a good DAS consists of four core elements:
- Data collected
- Data location or repository name/archive
- Link to dataset/repository (if applicable)
- PIDs (Persistent Identifiers)
Note: Because most DAS will be published in a web-based format, include hyperlinks wherever possible.
These are the data collected or used to help answer the study objective. Listing this/these datasets help the reader quickly understand what they will find in the dataset and is particularly helpful when dealing with multiple datasets.
The daily maximum near-surface air temperature data (Tmax) was retrieved from 13 models of the Coordinated Regional Climate Downscaling Experiment (CORDEX) (https://www.cordex.org/data-access/) for the Africa domain. (Varela et. al. 2020
Data location or repository name/archive
Where can the reader find the datasets? This can be both physical and digital datasets.
The 16S rRNA amplicon sequencing data associated with this study have been deposited in the NCBI Sequence Read Archive under the project accession PRJNA543313. The raw transcriptomic data have been deposited in the Gene Expression Omnibus under the accession GSE131158. …scripts and additional data structures… [are in the] GitHub repository:
https://github.com/isaisg/variovoraxRGI. (Finkle et. al. 2020)
Link to dataset/repository (if applicable)
Where exactly is the repository stored? Include a link to your repository to help the reader find your dataset more easily.
Data used in this study are available in the Movebank open data repository at https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study1271155477 (Movebank ID No. 1271155477). (Duncan et. al. 2020)
Persistent Identifiers (PIDs)
What are the specific PIDs needed to find, cite, and access the data? This may be represented as digital object identifiers (DOIs), reference numbers, ARK, HANDL, PURL, etc.
PacBio sequencing data are available at the NCBI Sequence Read Archive (SRA) with accession number PRJNA665727. Metagenomic sequencing data of laser-capture-microdissected tissue samples are available at the NCBI SRA with accession number PRJNA665536. All microscopy data have been deposited to Zenodo. A full list of DOIs is provided in the Supplementary Information. (Shi et. al. 2020)
Basic DAS Template
The dataset [title or type] used in this study are publicly available online in the [repository name] [link to data/repository] repository: [PIDs].
Poor example of a DAS
All datasets used in this study are publicly available through an open repository.
The example above states the availability status of the data used in the study, but it is lacking all specific identifiers needed to locate and identify said data.
Good example of a DAS
All cell-type transcriptome data are available in the NCBI SRA database under accession number PRJNA412708. Additional supplementary data are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.hp2fr73. (Sogabe, Hatleberg, Kocot, et al. 2019)
As outlined in the graphic below, this DAS includes the necessary information for identifying and locating all data used in the study.
In this article, we discussed the anatomy of a DAS when the data is openly available in a public repository. If you would like further examples of different types of DAS, check out Springer Nature’s article here, the University of Bath’s article here, and the American Meteorological Society’s article here.
This post is part of our series Research Bites. Come back every month for a new installment.
Data Availability Statements Cited
Duncan EM, Davies A, Brooks A, Chowdhury GW, Godley BJ, Jambeck J, et al. (2020) Message in a bottle: Open source technology to track the movement of plastic pollution. PLoS ONE 15(12): e0242459. https://doi.org/10.1371/journal.pone.0242459
Finkel, O.M., Salas-González, I., Castrillo, G. et al. A single bacterial genus maintains root growth in a complex microbiome. Nature 587, 103–108 (2020). https://doi.org/10.1038/s41586-020-2778-7
Shi, H., Shi, Q., Grodner, B. et al. Highly multiplexed spatial mapping of microbial communities. Nature (2020). https://doi.org/10.1038/s41586-020-2983-4
Sogabe, S., Hatleberg, W.L., Kocot, K.M. et al. Pluripotency and the origin of animal multicellularity. Nature 570, 519–522 (2019). https://doi.org/10.1038/s41586-019-1290-4
Varela R, Rodríguez-Díaz L, deCastro M (2020) Persistent heat waves projected for Middle East and North Africa by the end of the 21st century. PLoS ONE 15(11): e0242477. https://doi.org/10.1371/journal.pone.0242477