Data curation preservation issues (Facilities, Digital repository systems, and High performance computing)
Every breakthrough in digital technology promises better ways of creating, storing, and sharing information. Ironically, these same advancements have made preserving digital information more challenging than ever before. Digital resources are no longer threatened primarily by physical deterioration but by rapidly evolving technologies, increasingly complex computing environments, cybersecurity risks, and infrastructure limitations. Consequently, successful digital curation can not be achieved simply by adopting sophisticated technologies. It requires resilient facilities, trustworthy digital repository systems, and high-performance computing (HPC) environments that collectively support the long-term accessibility, authenticity, and reuse of digital resources. While technological innovation has strengthened preservation capabilities, I argue that the greatest challenge facing digital curation today lies in ensuring that these interconnected infrastructures remain sustainable, secure, and responsive to continuous technological change.
The effectiveness of digital curation begins with preservation facilities because they provide the physical and virtual infrastructure upon which every preservation activity depends. Secure data centres, reliable power supplies, disaster recovery systems, environmental controls, and high-speed networks collectively determine whether digital resources remain accessible over time. Although cloud-based facilities have transformed preservation by offering elastic storage, scalability, and on demand computing resources, they also introduce persistent challenges relating to bandwidth constraints, service continuity, legal jurisdiction, and long-term sustainability (Aitken et al., 2012). These concerns remain evident today despite significant advances in cloud technologies. The 2023 cyberattack on the British Library demonstrated how infrastructure failures can severely disrupt access to valuable digital collections, even within internationally respected memory institutions. Likewise, increasing cyber threats and climate-related disruptions continue to expose vulnerabilities in digital preservation infrastructure (Corrado & Moulaison Sandy, 2024). Therefore, preservation facilities should be evaluated not simply by their storage capacity but by their ability to ensure resilience, business continuity, and trust throughout the preservation lifecycle. In my view, institutions that prioritise technological expansion over infrastructural resilience risk investing in preservation systems that can not withstand future uncertainties.
Resilient infrastructure alone, however, does not guarantee meaningful preservation. Digital repository systems provide the governance framework through which digital resources are organised, described, preserved, and made available for long-term discovery and reuse. Repository platforms have evolved considerably by supporting persistent identifiers, metadata interoperability, and the FAIR principles. Nevertheless, their effectiveness extends beyond software functionality. Aitken et al. (2012) demonstrated through cloud-based initiatives such as DataBank and DataFlow that repositories are most valuable when they integrate secure storage, metadata management, preservation actions, and collaborative access within a sustainable environment. Recent research similarly concludes that trustworthy repositories depend more on governance, metadata quality, institutional commitment, and professional stewardship than on technology alone (Bamgbose et al., 2024). I therefore contend that repository systems should be viewed as governance platforms rather than storage platforms. Without consistent metadata practices, preservation policies, and skilled personnel, even technologically advanced repositories cannot guarantee the discoverability, authenticity, and long term reuse of digital assets.
The growing scale and complexity of research data further demonstrate why preservation requires more than reliable storage and repository management. High performance computing has become indispensable for analysing massive datasets generated through artificial intelligence, genomics, climate science, and other data intensive disciplines. According to Aitken et al. (2012), cloud based HPC provides significant computational advantages for intensive processing tasks but remains constrained by bandwidth limitations, extensive data transfer requirements, and communication between parallel computing processes. Although advances in computing have reduced some of these limitations, preservation challenges have shifted towards safeguarding software environments, algorithms, machine-learning models, and computational workflows alongside datasets themselves. Contemporary scholarship argues that preserving computational context has become essential for ensuring research reproducibility and long-term scientific transparency (Pasqui, 2024). Consequently, preserving digital objects in isolation is no longer sufficient. Sustainable digital curation increasingly depends on preserving the complete research ecosystem that enables future researchers to understand, verify, and reuse digital evidence.
Ultimately, facilities, digital repository systems, and high performance computing should not be regarded as independent technological investments but as complementary pillars of a resilient preservation ecosystem. Weakness in any one component can compromise the accessibility, authenticity, and long term value of digital resources. The literature consistently demonstrates that technological innovation alone can not guarantee sustainable digital preservation; equally important are governance, institutional commitment, and strategic investment in resilient infrastructure. As digital information continues to expand in both volume and complexity, organisations must shift their attention from acquiring increasingly sophisticated technologies to strengthening the ecosystems that sustain them. The future of digital curation will therefore be determined not by how successfully institutions create digital information, but by how effectively they preserve the infrastructures that enable knowledge itself to endure.
Aitken, B., McCann, P., McHugh, A., & Miller, K. (2012). Digital curation and the cloud. Digital Curation Centre.
Bamgbose, A. A., Mohd, M., Tengku Wook, T. S. M., & Mohamed, H. (2024). Organisational and technological factors affecting efficient service delivery in trustworthy digital repositories: A qualitative approach. Information Development.
Corrado, E. M., & Moulaison Sandy, H. (2024). Digital preservation for libraries, archives, and museums (3rd ed.). Rowman & Littlefield.
Pasqui, V. (2024). Digital curation and long-term digital preservation in libraries. JLIS.it, 15(1), 109–125.
Rhee, H. L. (2022). A new lifecycle model enabling optimal digital curation. Journal of Librarianship and Information Science, 56(1), 78–92.
Recker, J., Kleemola, M., & L'Hours, H. (2024). Closing gaps: A model of cumulative curation and preservation levels for trustworthy digital repositories. International Journal of Digital Curation, 19(1).
Comments
Post a Comment