Citing data sources—Why is it good and how to do it?

When was the last time you used data in your research or assignments? Be it primary or secondary, data is a pillar of scientific research. Just like sources such as books and journal articles, data used in research requires and deserves proper credits. And like books and journals citation, data citation can take on many formats depending on different styles and requirements. Read on to learn more about the benefits of citing data and get a jump start on data citation.

Why cite data?

Employing proper data citation practices benefits both data producers and data users during the scientific research process.

For data users, data citation supports the reproducibility of their research. It allows other researchers to locate and access research data more easily. It also increases transparency and encourages more high-quality datasets. Last but not least, it encourages the reuse of data for new research questions.

For data producers, data citation attributes the appropriate credit and increases the findability of their research. It also sets formalized standards for data to be recognized as legitimate, citable scholarly contribution. In addition, data citation allows for tracking and measuring the impact of data.

How to cite data?

While different style guides and publications have varying formats for data citation, the following components are generally required:

  • Author(s)
  • Date of publication
  • Title of dataset
  • Publisher or distributor
  • Persistent locator/identifier (ex. URL or DOI)
  • Version, when appropriate
  • Date accessed, when appropriate

For examples from style guides such as APA and MLA, visit the data citation page from Columbia University Libraries.

Using citation management software for your references? Endnote has the reference type "dataset." In Mendeley or Zotero, use more generic reference type templates, such as the "Document" item type in Zotero, and fill in other essential components required.

Last Revision