Donnerstag, 2. März 2023

Some thoughts about a minimalistic Archival Information System, part 1

Many of those who are dealing with the digital preservation of objects for the first time and who work in small memory organizations are often helpless in the face of the vast range of functions and requirements of current archival information systems.
Students of library or archival science often appear to be similarly overwhelmed when they are supposed to learn what constitutes archival software.

This has motivated me to write down thoughts on a minimalist archive information system. Because it really doesn't need much.

The basic terms

An archive essentially has three roles: the submitter, called the producer, the user, also called the consumer, and the problem solver who maintains the archive, also called the technical analyst.


When digital objects are transferred to the archive, it is called the ingest process. When they are requested from the archive, then this is the access process. 

The digital objects to be preserved are provided with all the necessary information for the archive ingest and are packaged in a predefined structure. This is called a Submission Information Package (SIP). You can actually imagine this just like in real life. For example, if you want to store a vase, you put it in a box, label it and put it on a shelf.

In the archive it is checked whether (allegorically) the vase is in the box and intact, and if there is a stamp and signature that says that the content of the package is indeed a vase. A file number and a storage location is assigned and the box goes sealed and neatly labeled on the shelf. The "box" is called Archival Information Package (AIP). With the seal, the archive takes responsibility.

At some point, when the user would like to see the vase from the archive again, the archive would process the request and send the vase and accompanying information to the user. This is then called a Dissemination Information Package (DIP).

In addition to this simple "I store something safely and retrieve it again at some point" approach, an archive fulfills another task that is not so obvious: it ensures that objects entrusted to it are kept usable. 

What does that mean in the digital world?
If it is possible in principle to store a digital object securely with bit accuracy, even over a very long period of time (bitstream archival), it still can age because the environment for using this object is no longer available.

There are essentially three concepts for keeping digital objects usable (content preservation): hardware museum, emulation or format migration

Hardware museums (e.g. a slot machine museum) try to keep old equipment running in a controlled environment. To do that, they have to build up a stock in time and build up knowledge on how to maintain and repair these devices.

With an emulation, I try to recreate the environment for the digital object so that it feels at home and doesn't notice any difference from the previous, real world. A very good example of emulators is e.g. MAME, but also various others, the e.g. retro computers like the Amiga or C64, so that old programs from their time can run on them. Here, too, I need knowledge about what the environment to be emulated looks like and how I can recreate it with today's means.

When migrating the format, I try to find a new form that retains the essential properties (significant properties) in good time and to transfer files from a digital object to a newer data format.

From this point onwards, it is assumed that this is the preferred way of maintaining usability.

It follows that an Archival Information System (AIS) must be able to support this process of format migration. The process (also called Preservation Planning and Action) results in a new version of the Archival Information Package being created. The AIS must be able to manage this.

That would basically be all there is to Archival Information Systems if it hadn't been for the librarians.

Unlike archivists, where a record is complete and closed, librarians understand the concepts of supplements and metadata submissions. A page that has fallen out has turned up here, a letter has been discovered there in an estate, or it has dawned on some people that there is now money for costly in-depth indexing. Ergo, librarians expect people to think about how to handle metadata and data updates on existing AIPs (called metadata update and AIP update). This is not trivial, since some AIPs are also very large and you want to avoid pointless copying. For such an update, we also need a good way for producers to tell the archive which AIP needs to be added or updated.

However, AIPs are already versioned in the case of format migration, the same can be used here as well. Any change to the AIP creates a new version of an AIP. And so that you can't accidentally break anything, you should always be able to go back to an old version. And since that is also error-prone, the result of the rollback process will simply be a new version. 

 That's it. It's nothing more. Easy, isn't it?


Keine Kommentare:

Kommentar veröffentlichen