Research Guides: Digital preservation program at the Libraries: Preservation repository

Tabula: The Digital Preservation System

Note

As of September 2023, work on the preservation repository has been put on hold while we focus on migrating our preservation content to a new storage infrastructure.

Background

The Libraries have been doing digital preservation activities for over 20 years. During this time, much of this work was done using manual and semi-automatic processes. To aid in the efficiency and sustainability of these practices, the Libraries put out a call of proposals for a digital preservation system. At the end of this process, a tool was chosen and installed. Testing began, and continued with this product for 2 years. In the end, it was decided that the tool was not doing what we had expected or needed it to do.

We went back to our requirements and thought about what it was we needed and about what wasn't working in the system we had just tried. We did another landscape check and explored other outside options but in the end decided that building our own digital preservation system based on micro-services would be the best option moving forward.

Tabula, the preservation system

Tabula was developed in house by members of the Digital Preservation and Repository Technology group. The system is built using MySQL databases and Python scripting. The web interface utilizes Python with Flask and Flask-Session creating the dynamic and responsive design.

The system uses the concept of microsystems. Independent components run applications/services that focus on a specialized task. This makes it easier to change or develop specific functional areas without affecting the entire application.

Details about the system can be found below.

Content being preserved

The preservation repository will contain content the Libraries is responsible for including:

Strategic digitization projects
Minnesota Digital Library content (MDL)
Content ingested into the University Digital Conservancy (UDC)
Content ingested into the Data Repository (DRUM)

Other collection areas:

Digitization for preservation
Digital video delivery
Ag Econ
Aerial maps collection
Files documenting the final product of grants upon prior approval
Born digital materials from the Archives and Special Collection's units
Special projects as determined via contracts or agreements

Functionality

The main functions of the system include:

Ingest
Searching
Data management
Exporting files

Areas requiring additional or development include:

Reporting
Workbench
- for editing metadata
- for troubleshooting ingests
Preservation planning activities
- file format migrations

Metadata

Metadata is the main tool that allows us to find specific files as well as to perform preservation activities. Descriptive metadata is provided by the content creators, or the access repository in which the objects can be found. Administrative metadata is provided by the contributor as well as created within the system. Technical metadata is captured upon ingest into the system.

The following provides a list of the current metadata fields that the system can keep track of or captures. Required fields are marked by an asterisk (*).

Descriptive metadata	Technical metadata	Administrative metadata
Local identifier Title* Alternative title Description Creator Contributor Publisher Abstract Date created Type* Format (physical) Extent Subjects Language Spatial IsPartOf (collection)	File name* File extension PUID (PRONOM Persistent Unique Identifier) File size File format Checksum value	Contributing organization Contact information Access repository Project affiliation UUID* Archive location Derivative type Derivative location

Hardware/Software Environment

Environment

Red Hat Enterprise Linux
MySQL 8
Spring Boot 2.7 - Java 8
Python 3.6
- Flask 1.1
- Session 0.0.4

Tools

Clam AV
DROID
JHOVE
Imagemagik
FFmpeg

Servers

Web server
- 8 CPU
- 8 GB RAM
- 2 TB Disk
Application servers (2)
- 16 CPU
- 132 GB RAM
- 4 TB Disk
- 40 Gbps NIC

Storage Environment

Working storage
- 5 TB NFS
  - NETAPP FAS 8200
Permanent storage
- 1.5 PB NFS
  - Qumulo / HPE Apollo (x2)
Tape storage
- Spectralogic Spectra TFinity ExaScale tape library shared with 3 LTO9 Drives and 60 slots
- 2.8 PB of storage

Interface Progression

Screenshot of the command line version of Tabula. Image shows the Ingest screen with steps for ingest.

Early command line interface of Tabula

The first versions of Tabula used menus within the command line interface to walk through the process of ingesting materials into the preservation system.

An early graphical user interface mock up of Tabula

This mockup was created to show the general layout of what the graphical user interface for Tabula might look like. Including a header for the Tabula name, a sidebar with menu options, and a center section for the page content.

Screenshot of Tabula's current graphical user interface

This interface builds in responsive design as well as user driven menu options in the side bar.

Last Updated: Nov 15, 2024 9:21 AM