Data collection and Repositories



Data is the lifeblood of modern knowledge systems, yet its value is only realised when it is systematically collected, organised, and stored in environments that support long term access and reuse. Data collection and repositories sit at the very heart of data curation, forming the infrastructure upon which research, policy, and institutional memory depend.

Data collection refers to the deliberate process of gathering raw information from various sources  surveys, sensors, experiments, administrative records, or observations in a structured manner (Corti et al., 2019). However, collecting data without a curation strategy is insufficient. Borgman (2015) argues that data does not speak for itself; it requires context, documentation, and governance to become meaningful and reusable across different communities. This insight shifts the responsibility of data professionals beyond mere collection toward stewardship an active, ongoing engagement with the data lifecycle.

https://youtube.com/@research-methods-class?si=GTCtIQxCIA1rBmYs

Repositories are the institutional response to this challenge. A data repository is a managed digital environment designed to ingest, preserve, describe, and provide access to datasets over time (Whyte & Tedds, 2011). Repositories range from discipline specific archives, such as GenBank for genomic data, to institutional repositories hosted by universities and research councils. What distinguishes a well functioning repository from simple file storage is its adherence to metadata standards, access controls, persistent identifiers, and preservation policies (Wilkinson et al., 2016). These features collectively ensure that deposited data remains findable, accessible, interoperable, and reusable the FAIR principles that now guide international data management practice. https://youtu.be/veOwUe_J3vA?si=mQWP2PdJvdLcieuM

What is particularly important to appreciate is that the quality of a repository is inseparable from the quality of data collection practices that precede it. Poorly documented data, inconsistent naming conventions, or missing provenance information undermine even the most technically sophisticated repository. Data curators must therefore engage upstream, advising researchers on metadata schemas and collection standards before data is ever deposited (Corti et al., 2019). This upstream engagement transforms the curator from a passive archivist into an active partner in the research process.

In the Malawian context, where research infrastructure is still developing, building awareness around responsible data collection and trusted repository use is not merely an academic exercise it is a practical imperative. As institutions invest in digital transformation, grounding that investment in sound curation principles will determine whether the data generated today remains useful tomorrow.


References

Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. MIT Press.

Corti, L., Van den Eynden, V., Bishop, L., & Woollard, M. (2019). Managing and sharing research data: A guide to good practice (2nd ed.). SAGE Publications.

Whyte, A., & Tedds, J. (2011). Making the case for research data management. DCC Briefing Papers. Digital Curation Centre. https://www.dcc.ac.uk/resources/briefing-papers

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., & Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, Article 160018. https://doi.org/10.1038/sdata.2016.18

Comments

  1. This is a great work, clearly demonstrating what a repository is, and its roles

    ReplyDelete
  2. Good write up, I like the idea of the Malawian context, area for further research

    ReplyDelete
  3. Your work offers a clear understanding of data collection as part of digital curation in as far as repository is concerned.Good job

    ReplyDelete
  4. Well articulated, I have learn something from your post. Well done

    ReplyDelete
  5. Loved the metaphor on Data... Great work

    ReplyDelete

Post a Comment

Popular posts from this blog

Storing Data

Data curation preservation issues (threats to digital materials)