Digital preservation program at the Libraries

Tabula: The Digital Preservation System

Note

As of September 2023, work on the preservation repository has been put on hold while we focus on migrating our preservation content to a new storage infrastructure.  

Background

The Libraries have been doing digital preservation activities for over 20 years.  During this time, much of this work was done using manual and semi-automatic processes.  To aid in the efficiency and sustainability of these practices, the Libraries put out a call of proposals for a digital preservation system.  At the end of this process, a tool was chosen and installed.  Testing began, and continued with this product for 2 years.  In the end, it was decided that the tool was not doing what we had expected or needed it to do.

We went back to our requirements and thought about what it was we needed and about what wasn't working in the system we had just tried.  We did another landscape check and explored other outside options but in the end decided that building our own digital preservation system based on micro-services would be the best option moving forward.  

Tabula, the preservation system

Tabula was developed in house by members of the Digital Preservation and Repository Technology group.  The system is built using MySQL databases and Python scripting. The web interface utilizes Python with Flask and Flask-Session creating the dynamic and responsive design.  

The system uses the concept of microsystems. Independent components run applications/services that focus on a specialized task. This makes it easier to change or develop specific functional areas without affecting the entire application.   

Details about the system can be found below. 

Content being preserved

The preservation repository will contain content the Libraries is responsible for including:

  • Strategic digitization projects
  • Minnesota Digital Library content (MDL)
  • Content ingested into the University Digital Conservancy (UDC)
  • Content ingested into the Data Repository (DRUM)

Other collection areas:

  • Digitization for preservation
  • Digital video delivery
  • Ag Econ
  • Aerial maps collection
  • Files documenting the final product of grants upon prior approval
  • Born digital materials from the Archives and Special Collection's units
  • Special projects as determined via contracts or agreements

Functionality

The main functions of the system include:

  • Ingest
  • Searching
  • Data management
  • Exporting files

Areas requiring additional or development include:

  • Reporting
  • Workbench
    • for editing metadata
    • for troubleshooting ingests 
  • Preservation planning activities
    • file format migrations

Metadata

Metadata is the main tool that allows us to find specific files as well as to perform preservation activities. Descriptive metadata is provided by the content creators, or the access repository in which the objects can be found.  Administrative metadata is provided by the contributor as well as created within the system.  Technical metadata is captured upon ingest into the system.  

The following provides a list of the current metadata fields that the system can keep track of or captures. Required fields are marked by an asterisk (*).

Descriptive metadata Technical metadata Administrative metadata
  • Local identifier
  • Title*
  • Alternative title
  • Description
  • Creator
  • Contributor
  • Publisher
  • Abstract
  • Date created
  • Type*
  • Format (physical)
  • Extent
  • Subjects
  • Language
  • Spatial
  • IsPartOf (collection)
  • File name*
  • File extension
  • PUID (PRONOM Persistent Unique Identifier)
  • File size
  • File format
  • Checksum value
  • Contributing organization
  • Contact information
  • Access repository
  • Project affiliation
  • UUID*
  • Archive location
  • Derivative type
  • Derivative location

Hardware/Software Environment

Environment

  • Red Hat Enterprise Linux
  • MySQL 8
  • Spring Boot 2.7 - Java 8
  • Python 3.6
    • Flask 1.1
    • Session 0.0.4

Tools 

  • Clam AV
  • DROID
  • JHOVE
  • Imagemagik 
  • FFmpeg

Servers

  • Web server
    • 8 CPU
    • 8 GB RAM
    • 2 TB Disk
  • Application servers (2)
    • 16 CPU
    • 132 GB RAM
    • 4 TB Disk
    • 40 Gbps NIC

Storage Environment

  • Working storage
    • 5 TB NFS
      • NETAPP FAS 8200
  • Permanent storage
    • 1.5 PB NFS
      • Qumulo / HPE Apollo (x2)
  • Tape storage
    • Spectralogic Spectra TFinity ExaScale tape library shared with 3 LTO9 Drives and 60 slots 
    • 2.8 PB of storage

Interface Progression

Screenshot of the command line version of Tabula.  Image shows the Ingest screen with steps for ingest.

Early command line interface of Tabula

The first versions of Tabula used menus within the command line interface to walk through the process of ingesting materials into the preservation system.  

An early graphical user interface mock up of Tabula

This mockup was created to show the general layout of what the graphical user interface for Tabula might look like.  Including a header for the Tabula name, a sidebar with menu options, and a center section for the page content.

Screenshot of Tabula's Current Graphical User Interface

Screenshot of Tabula's current graphical user interface

This interface builds in responsive design as well as user driven menu options in the side bar.  

Last Updated: Nov 15, 2024 9:21 AM