Data Citation Corpus

The Data Citation Corpus is a trusted central aggregate of all data citations to further the understanding of data usage and advance meaningful data metrics.

  |  Updated: 24 Jun 2024

Make Data Count

Project Goals

The Data Citation Corpus is part of the Make Data Count initiative which coordinates and advocates for robust infrastructure and guidelines for data citation and data usage and works to advance the adoption of best practices across research communities and platforms. The initiative has three main areas of focus:

  1. Open infrastructure to enable the evaluation of data reuse
  2. Outreach to drive awareness and adoption of open data metrics
  3. Evidence on the reuse and impact of open data through collaboration with bibliometricians

The Data Citation Corpus is specifically committed to advancing the development and adoption of open data metrics to facilitate the evaluation and acknowledgment of research data reuse and impact. The corpus aims to enable different stakeholders – including funders and institutions – to evaluate the reach of open datasets produced and shared by researchers, and facilitate large-scale analyses to build evidence on data usage practices across institutions and disciplines. The current roadmap is designed to enhance the visibility and recognition of data.


The first release of the corpus demonstrates the value of incorporating data citations from different sources and the ways in which users will be able to interact with the corpus. The corpus includes a dashboard that allows users to visualize the current content of the corpus or narrow the results according to specific filters, such as the affiliation associated with the dataset or the repository where the dataset is hosted.

What's Needed

The next stages of our work on the data citation corpus will involve addressing existing gaps in metadata for data citations, enhancements to the dashboard and corpus visualizations, and ingestion of data citations from additional sources.