About ICC profile validation
In the last days we enhanced our baseline TIFF conformance checking tool "checkit_tiff" with support to validate TIFFs containing embedded ICC profiles.
There are two specifications of ICC profiles. The first and older one is from the year 2001 and available at http://www.color.org/ICC_Minor_Revision_for_Web.pdf. The second one (and current version) is http://www.color.org/specification/ICC1v43_2010-12.pdf.
The ICC profile standards are very complex and do not really fit in the design of the checkit_tiff configuration files, so we decided to check only the headers of the embedded ICC profiles for now.
Because the differences between the headers of both standards are very marginal at first sight, this could be simple enough to additionally implement it in checkit_tiff.
For more detailed checks we would need a plugin system to delegate validation of embedded standards (as ICC, XMP and so on) to more highly specialized validators.
Ooops, our tool finds more broken TIFFs again
To test the validation, we ran checkit_tiff against some old tiffs from our digitization collection. As you can see in the following picture, the checkit_tiff tool detects a discrepancy between the size of the embedded ICC-profile as reported in TIFF-tag (34675) and the size reported by the ICC profile itself:
After some analysis we made an interesting observation. The given TIFF reports a size of 13691 bytes (in hex: 0x357b), but the ICC itself (tag "profile size") reports a size of 669051 bytes (in hex: 0xa357b). We thought we had a bug in our TIFF tag or ICC header decoding, but other tools (exiftool or tiffdump) also reported this difference.
As we saw in previous TIFFs, some TIFF implementations are wrong and often some information were missing because of off-by-one errors. In this case, it seems that the software "Omniscan 12.8 Build2476" does not correctly fill the Tiff-tag 34675 for ICC-profile, leaving out the most significant byte (here: '0xa').
We need your help
Anyway. Writing a conformance checker is a tough job and sometimes we introduce bugs in this kind of software as well. Therefore, please have a look at the code of checkit_tiff, test it and send us bug reports if you find any bugs.
The code can be found on https://github.com/SLUB-digitalpreservation/checkit_tiff.
Here are some open questions about ICC profiles that we have encountered:
- We found some TIFFs which reported ICC version 4.2.0, but on http://www.color.org we only found versions 4.3.0 and 2.4.0. Could you help us to find out the differences in the headers for the different versions?
- The meaning of the field "preferred CMM type" is unclear. We assume that the list of allowed strings is part of the document http://www.color.org/registry/signature/TagRegistry-2016-05.pdf. But we have also found a TIFF where the string 'Lino' is set for this ICC header field. Are we interpreting the document in a wrong way, or has the tag been set to an incorrect value?