Dienstag, 7. März 2023

Some thoughts about a minimalistic Archival Information System, part 2

 In the last post I explained some basic terms. Now it's time for the real thing. 

Choosing the right format

The first question, what should the information packages (SIP, AIP, DIP) look like? It is important that they are easy to process, easy to understand and easy to expand. Fortunately, there is RFC8493 that has the solution ready for us: BagIt.

 In (1) we store the metadata, in (2) there is space for our payload. BagIt is simple, it is a definition of a directory structure and some files that take over certain functions. Very interesting for us, if we want to store digital objects, we can store them in the BagIt payload. We can take over this area completely unchanged when processing a SIP and creating the AIP. The same is possible later when creating the DIPs from the AIPs. BagIt gives a lot of freedom. To limit ourselves, we choose UTF-8 for all metadata and text files.And we don't use fetch bags. Since BagIt is now standardized, we use version 1.


Metadata and AIP update considerations

Many AIS systems are insufficiently prepared for metadata and AIP updates. In my experience, it is important to think about how and which data is updated and what the consequences are. In order to enable the producer to submit supplements, these must be clearly assigned to an existing AIP. Either you give the producer back an ID for his first recording. This is not a good choice because the process then has a strong coupling and internals are exposed to the outside world. In addition, if a producer wants to change the AIS, there can be collisions. A better choice is to tell the producer to choose a unique ID for your data yourself and transmit it in your SIPs. Internally, we would then use these to search for the appropriate AIPs. The ID is called "ExternalID" and is the base for our internal MAIS-AIP-ID. More on that later.

In the last post I already mentioned that we have to think about the topic of versioning of AIPs. Not only because of the metadata or AIP updates, but also in the case of a PP&A, i.e. format migration. A simple idea is to introduce linked lists.

This allows us to easily implement the functionality of rolling back an AIP version as well.

A new AIP points to the predecessor in which the new version receives a reference entry in the "bag-info.txt":


  • 'MAIS-previous-AIP' - contains AIP-ID of the current AIS (MAIS-AIP-ID)
  • 'MAIS-migrated-AIP' - contains AIP-ID of the previous AIS if AIP was migrated from there
  • 'MAIS-origin-AIS' - contains identifiers of the previous AIS from where the AIP was migrated

 The last two keys are optional and only needed if AIP-AIP-Transfer is needed to move digital objects from one archival information system to another.

Keine Kommentare:

Kommentar veröffentlichen