Previous work
In an older article (see https://kulturreste.blogspot.com/2018/10/heres-tool-make-it-work.html) I have already done an analysis of PRONOM signatures. Since today the module for this exists on CPAN, see https://metacpan.org/pod/File::FormatIdentification::Pronom for details.
In addition to the statistics on PRONOM signatures, the Perl package comes with two more helper scripts that can make the work of a long-term archivist easier.
Format identification
Here is an example output for a TIFF file, which was wrongly recognized as GeoTIFF by Droid:
perl -I lib bin/pronomidentify.pl -s DROID_SignatureFile_V96.xml -b /tmp/00000007.tif
/tmp/00000007.tif identified as Tagged Image File Format with PUID fmt/353 (regex quality 1) /tmp/00000007.tif identified as Geographic Tagged Image File Format (GeoTIFF) with PUID fmt/155 (regex quality 2)
Colorized output of possible signature hits in the hexeditor wxHexEditor
Under Linux you can use the editor wxHexEditor to analyze files. It allows you to create tag-files, in which you can define sections that are marked with colors and annotated.
The script pronom2wxhexeditor creates such a file. In the following you can see the call and a screenshot.
The script pronom2wxhexeditor creates such a file. In the following you can see the call and a screenshot.
perl -I lib bin/pronom2wxhexeditor.pl -s DROID_SignatureFile_V96.xml -b /tmp/00000007.tif
What next?
Well, it's up to us as a community to use the existing tools and use their possibilities to improve our daily work. Anyone who has suggestions for improvement or ideas is welcome to share them with us.
I would be especially happy if servant spirits would take the pronoun statistics to their chest and help improve the pronoun signatures.
It makes sense to start with the orphaned signatures and to check multiple used signatures again.