The Anatomy of a Data Citation
What is a data citation and what makes it different from other citations?
Data citations are references to a data source. Unlike most other sources found in a bibliography, data do not have an agreed upon way of attributing credit, and is not widely recognized as a ‘citable’ source. This becomes a problem when we have discussions on transparency and trust in science. We need citations to accurately verify the study sources and to determine the quality of the study.
In 2014, Force11 – an international coalition of researchers, librarians, publishers and research funders – addressed the lack of consensus on data citations, and created the ‘Joint Declaration of Data Citation Principles’ (JDDCP).
Force11’s declaration asserts that, “Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse” (Data Citation Synthesis Group, 2014). As a signatory to this document, our guide will follow the eight principles outlined in the JDDCP to give a better idea of how to write a JDDCP-compliant citation.
Throughout this post we will reference various principles from the JDDCP. We urge you to read along with the original declaration here!
Joint Declaration of Data Citation Principles (JDDCP)
Principle 1: “Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications” (Data Citation Synthesis Group, 2014).
The movement towards widespread data sharing is relatively new, and as we develop a more transparent scientific community and adopt more expansive views on data sharing, the importance of data citations increase. Data are also key research assets and outputs.
Where does the data citation go?
Principle 3: “In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited” (ibid, 2014).
The data citation can be either parenthetical in-text or notated by a footnote. Follow your citation guide for the most appropriate location of the citation.
How to Write a Data Citation
Principle 2: “Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data” (ibid, 2014).
In accordance with the principle above, our guide will give some methods for citing data, but each of these methods can and should be adjusted according to the information available in the source and what seems appropriate.
The first four listed are core citation elements needed to properly cite a source. These should always be included in the writings if available. We will not delve into each of these in this post, but there are numerous online guides to identifying these elements.
- Publisher or Distributor*
You can read more about these core elements from these sources: American Psychology Association, Santa Fe College APA Style, & Lakeland College Chicago Guide
The next four elements are specific to data citations.
- Numerical ID and Version Number
- Description of Form
- Source of Published Material *this is typically in place of a publisher
- Persistent Identifiers (PIDs)
Numerical ID vs. PID
A PID, or persistent identifier, is any lasting identifier (i.e., one that does not have the potential to break like a URL). PIDs are usually given out by a registered agency like Crossref. The numerical identifier is specific to the publisher/distributor’s organization, but may also be a lasting identifier. Therefore, a numerical ID can be a PID, but a PID is not necessarily the numerical ID.
Core Citation Elements: Author or Rights-holder, Title, Date, Publisher or Distributor
Holahan, J and Long, S.K. (2013). Health Reform Monitoring Survey, United States, Third Quarter 2019 (ICPSR 37922; Version V1). [Data set]. Health and Medical Care Archive. https://doi.org/10.3886/ICPSR37922.v1
Numerical ID and Version number; What is the version number on your data?
Desai, Sonalde, Vanneman, Reeve, and National Council of Applied Economic Research, New Delhi. (2005). India Human Development Survey (IHDS) (ICPSR 22626; Version 2018-08-08). [Data set]. Inter-university Consortium for Political and Social Research. https://doi.org/10.3886/ICPSR22626.v12
Description of Form; What form does your data take (e.g., “[Dataset and codebook],” or “[Dataset]”)? This may not be listed in the source, so use your best judgment.
NatureServe and IUCN (International Union for Conservation of Nature). (2007). Crotaphytus reticulatus (Version 2018-2). [Dataset]. The IUCN Red List of Threatened Species. https://www.iucnredlist.org.
Source of Published Material; This is typically in place of a publisher. If your data are from an existing study, cite the publication which the data was retrieved from (journal article, report, webpage) instead of the data set alone.
Nasa Exoplanet Archive. (2019). Composite Planet Data Table [Data set]. NASA Exoplanet Archive. https://catcopy.ipac.caltech.edu/dois/doi.php?id=10.26133/NEA2.
Persistent Identifiers (PIDs); What are the specific PIDs needed to find, cite, and access the data? This may be represented as digital object identifiers (DOIs), reference numbers, ARK, HANDL, PURL, etc.
University of California, Los Angeles. (2017). Los Angeles Metropolitan Area Surveys [LAMAS] 3, 1971 (ICPSR 36611; V1). Inter-university Consortium for Political and Social Research. https://doi.org/10.3886/ICPSR36611.v1
Basic APA Citation Template
Author. (Year). Title (ID; Version #) [Description of form]. Publisher name or source of published material. [PIDs].
Poor example of a data citation
Udborg OD, Cauddoth Fr. (2997). Ventilatory Defects in Orc Raiders Exposed to Industrial Fumes.
The example above is lacking all specific identifiers needed to locate and give credit to the original source. It is better than no citation to the data. Note: this is a fake publication. 🙂
Good example of a data citation
Carrea, L. & Merchant, C. J. (2019). GloboLakes: Lake Surface Water Temperature (LSWT) (version 4.0) [Data set]. Centre for Environmental Data Analysis. https://catalogue.ceda.ac.uk/uuid/76a29c5b55204b66a40308fc2ba9cdb3.
As outlined in the graphic below, this citation includes the necessary information for identifying and locating all data used in the study.