Data collection and Repositories
Data is the lifeblood of modern knowledge systems, yet its value is only realised when it is systematically collected, organised, and stored in environments that support long term access and reuse. Data collection and repositories sit at the very heart of data curation, forming the infrastructure upon which research, policy, and institutional memory depend.
Data collection refers to the deliberate process of gathering raw information from various sources surveys, sensors, experiments, administrative records, or observations in a structured manner (Corti et al., 2019). However, collecting data without a curation strategy is insufficient. Borgman (2015) argues that data does not speak for itself; it requires context, documentation, and governance to become meaningful and reusable across different communities. This insight shifts the responsibility of data professionals beyond mere collection toward stewardship an active, ongoing engagement with the data lifecycle.
https://youtube.com/@research-methods-class?si=GTCtIQxCIA1rBmYs
Repositories are the institutional response to this challenge. A data repository is a managed digital environment designed to ingest, preserve, describe, and provide access to datasets over time (Whyte & Tedds, 2011). Repositories range from discipline specific archives, such as GenBank for genomic data, to institutional repositories hosted by universities and research councils. What distinguishes a well functioning repository from simple file storage is its adherence to metadata standards, access controls, persistent identifiers, and preservation policies (Wilkinson et al., 2016). These features collectively ensure that deposited data remains findable, accessible, interoperable, and reusable the FAIR principles that now guide international data management practice. https://youtu.be/veOwUe_J3vA?si=mQWP2PdJvdLcieuM
What is particularly important to appreciate is that the quality of a repository is inseparable from the quality of data collection practices that precede it. Poorly documented data, inconsistent naming conventions, or missing provenance information undermine even the most technically sophisticated repository. Data curators must therefore engage upstream, advising researchers on metadata schemas and collection standards before data is ever deposited (Corti et al., 2019). This upstream engagement transforms the curator from a passive archivist into an active partner in the research process.
In the Malawian context, where research infrastructure is still developing, building awareness around responsible data collection and trusted repository use is not merely an academic exercise it is a practical imperative. As institutions invest in digital transformation, grounding that investment in sound curation principles will determine whether the data generated today remains useful tomorrow.
References
Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. MIT Press.
Corti, L., Van den Eynden, V., Bishop, L., & Woollard, M. (2019). Managing and sharing research data: A guide to good practice (2nd ed.). SAGE Publications.
Whyte, A., & Tedds, J. (2011). Making the case for research data management. DCC Briefing Papers. Digital Curation Centre. https://www.dcc.ac.uk/resources/briefing-papers
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., & Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, Article 160018. https://doi.org/10.1038/sdata.2016.18
Great stuff, Naomie
ReplyDeleteThis is a great work, clearly demonstrating what a repository is, and its roles
ReplyDeleteWell presented
ReplyDeleteWell done
ReplyDeleteWell articulated Nao
ReplyDeleteWell done Nao
ReplyDeleteGood write up, I like the idea of the Malawian context, area for further research
ReplyDeleteYour work offers a clear understanding of data collection as part of digital curation in as far as repository is concerned.Good job
ReplyDeleteWell articulated, I have learn something from your post. Well done
ReplyDeleteLoved the metaphor on Data... Great work
ReplyDelete