Tabula: The Digital Preservation System
Background
The Libraries have been doing digital preservation activities for over 20 years. During this time, much of this work was done using manual and semi-automatic processes. To aid in the efficiency and sustainability of these practices, the Libraries put out a call of proposals for a digital preservation system. At the end of this process, a tool was chosen and installed. Testing began, and continued with this product for 2 years. In the end, it was decided that the tool was not doing what we had expected or needed it to do.
We went back to our requirements and thought about what it was we needed and about what wasn't working in the system we had just tried. We did another landscape check and explored other outside options but in the end decided that building our own digital preservation system based on micro-services would be the best option moving forward.
Tabula, the preservation system
Tabula was developed in house by members of the Digital Preservation and Repository Technology group. The system is built using MySQL databases and Python scripting. The web interface utilizes Python with Flask and Flask-Session creating the dynamic and responsive design.
The system uses the concept of microsystems. Independent components run applications/services that focus on a specialized task. This makes it easier to change or develop specific functional areas without affecting the entire application.
Details about the system can be found below.
Content being preserved
The preservation repository will contain content the Libraries is responsible for including:
- Strategic digitization projects
- Minnesota Digital Library content (MDL)
- Content ingested into the University Digital Conservancy (UDC)
- Content ingested into the Data Repository (DRUM)
Other collection areas:
- Digitization for preservation
- Digital video delivery
- Ag Econ
- Aerial maps collection
- Files documenting the final product of grants upon prior approval
- Born digital materials from the Archives and Special Collection's units
- Special projects as determined via contracts or agreements
Functionality
The main functions of the system include:
- Ingest
- Searching
- Data management
- Exporting files
Areas requiring additional or development include:
- Reporting
- Workbench
- for editing metadata
- for troubleshooting ingests
- Preservation planning activities
- file format migrations
Metadata
Metadata is the main tool that allows us to find specific files as well as to perform preservation activities. Descriptive metadata is provided by the content creators, or the access repository in which the objects can be found. Administrative metadata is provided by the contributor as well as created within the system. Technical metadata is captured upon ingest into the system.
The following provides a list of the current metadata fields that the system can keep track of or captures. Required fields are marked by an asterisk (*).
Descriptive metadata | Technical metadata | Administrative metadata |
---|---|---|
|
|
|
Hardware/Software Environment
Environment
- Red Hat Enterprise Linux
- MySQL 8
- Spring Boot 2.7 - Java 8
- Python 3.6
- Flask 1.1
- Session 0.0.4
Tools
- Clam AV
- DROID
- JHOVE
- Imagemagik
- FFmpeg
Servers
- Web server
- 8 CPU
- 8 GB RAM
- 2 TB Disk
- Application servers (2)
- 16 CPU
- 132 GB RAM
- 4 TB Disk
- 40 Gbps NIC
Storage Environment
- Working storage
- 5 TB NFS
- NETAPP FAS 8200
- 5 TB NFS
- Permanent storage
- 1.5 PB NFS
- Qumulo / HPE Apollo (x2)
- 1.5 PB NFS
- Tape storage
- Spectralogic Spectra TFinity ExaScale tape library shared with 3 LTO9 Drives and 60 slots
- 2.8 PB of storage