Montag, 16. April 2018

Wie verwirrend! How confusing! Defaults in TIFF

Hint: english version below :)

Erste Überlegung: Hä?

Ernsthaft? Was soll denn an den Defaults von TIFF so problematisch sein? Steht doch alles in der Spezifikation. Es gilt:
  1. Enthält ein TIFF ein Tag nicht, für das ein Default definiert ist, gilt der Default.
  2. Wenn ein TIFF ein Tag enthält, gilt der Wert des Tags.
  3. Sonst gilt, der Wert ist nicht definiert und demnach nicht vorhanden.

Der zweite Blick

Leider ist es in der Praxis komplizierter. Ich bekam die Frage, wenn jhove bei der Prüfung der von checkit_tiff mitgelieferten Beispiel-TIFFs für das Thresholding-Tag 263 den Wert "1" ausgibt:
$> jhove tiffs_should_pass/minimal_valid_baseline.tiff
Jhove (Rel. 1.6, 2011-01-04)
 Date: 2018-04-16 12:41:25 MESZ
 RepresentationInformation: tiffs_should_pass/minimal_valid_baseline.tiff
  ReportingModule: TIFF-hul, Rel. 1.5 (2007-10-02)
  LastModified: 2017-07-14 11:28:57 MESZ
  Size: 323
  Format: TIFF
  Version: 5.0
  Status: Well-Formed and valid
  SignatureMatches:
   TIFF-hul
  MIMEtype: image/tiff
  Profile: Baseline bilevel (Class B), TIFF/IT-BP (ISO 12639:1998), TIFF/IT-BP/P1 (ISO 12639:1998), TIFF/IT-BP/P2 (ISO 12639:1998), TIFF/IT-MP (ISO 12639:1998)
  TIFFMetadata:
   ByteOrder: little-endian
   IFDs:
    Number: 1
    IFD:
     Offset: 38
     Type: TIFF
     Entries:
      NisoImageMetadata:
       ByteOrder: little_endian
       CompressionScheme: uncompressed
       ImageWidth: 20
       ImageHeight: 10
       ColorSpace: white is zero
       Orientation: normal
       SamplingFrequencyUnit: inch
       XSamplingFrequency: 376,193
       YSamplingFrequency: 376,193
       BitsPerSample: 1
       BitsPerSampleUnit: integer
       SamplesPerPixel: 1
      NewSubfileType: 0
      SampleFormat: 1
      MinSampleValue: 0
      MaxSampleValue: 1
      Threshholding: 1
      TIFFITProperties:
       BackgroundColorIndicator: background not defined
       ImageColorIndicator: image not defined
       TransparencyIndicator: no transparency
       PixelIntensityRange: 0, 1
       RasterPadding: 1 byte
       BitsPerRunLength: 8
       BitsPerExtendedRunLength: 16
aber checkit_tiff mit dem beigefügten Beispiel keinen Fehler wirft, obwohl doch keine Positiv-Regel in der Konfigurationsdatei hinterlegt ist:
$> checkit_tiff example_configs/cit_tiff6_baseline_SLUB.cfg tiffs_should_pass/minimal_valid_baseline.tiff
'./build/checkit_tiff' version: development_v0.4.0
    revision: 408
licensed under conditions of libtiff (see http://libtiff.maptools.org/misc.html)
cfg_file=example_configs/cit_tiff6_baseline_SLUB.cfg
tiff file/dir=tiffs_should_pass/minimal_valid_baseline.tiff
file: tiffs_should_pass/minimal_valid_baseline.tiff
(./)    general    --> TIFF should have just one IFD, (lineno: 12)
(./)    general    --> All tag offsets should be word aligned, (lineno: 14)
(./)    general    --> All offsets may only be used once, (lineno: 14)
(./)    general    --> All tag offsets should be greater than zero, (lineno: 14)
(./)    general    --> All IFDs should be word aligned, (lineno: 15)
(./)    general    --> Tags should be sorted in ascending order, (lineno: 15)
(./)    tag 256 (ImageWidth)    --> Tag should have a value in a range of (lineno: 23)
(./)    tag 257 (ImageLength)    --> Tag should have a value in a range of (lineno: 25)
(./)    tag 258 (BitsPerSample)    --> One or more conditions needs to be combined in a logical_or operation (open) (lineno: 30)
(./)    tag 259 (Compression)    --> Tag should have one exact value. (lineno: 36)
(./)    tag 262 (Photometric)    --> Tag should have a value in a range of (lineno: 40)
(./)    tag 273 (StripOffsets)    --> TIFF should contain this tag. (lineno: 45)
(./)    tag 277 (SamplesPerPixel)    --> Tag should have one exact value. (lineno: 52)
(./)    tag 278 (RowsPerStrip)    --> Tag should have a value in a range of (lineno: 55)
(./)    tag 279 (StripByteCounts)    --> TIFF should contain this tag. (lineno: 60)
(./)    tag 282 (XResolution)    --> Tag should have a value in a range of (lineno: 63)
(./)    tag 283 (YResolution)    --> Tag should have a value in a range of (lineno: 66)
(./)    tag 296 (ResolutionUnit)    --> Tag should have one exact value. (lineno: 69)
(./)    tag 254 (SubFileType)    --> One or more conditions needs to be combined in a logical_or operation (open) (lineno: 77)
(./)    tag 274 (Orientation)    --> Tag should have one exact value. (lineno: 113)
(./)    tag 284 (PlanarConfig)    --> Tag should have one exact value. (lineno: 122)
(./)
(./)Yes, the given tif is valid :)
Zuerst war ich etwas erschrocken, war ich mir doch sicher, dass checkit_tiff funktioniert und ich alles sorgfältig geprüft hatte. Zur Sicherheit habe ich die Ausgabe mit tiffdump der libtiff geprüft:
$> tiffdump tiffs_should_pass/minimal_valid_baseline.tifftiffs_should_pass/minimal_valid_baseline.tiff:
Magic: 0x4949 <little-endian> Version: 0x2a <ClassicTIFF>
Directory 0: offset 38 (0x26) next 0 (0)
SubFileType (254) LONG (4) 1<0>
ImageWidth (256) SHORT (3) 1<20>
ImageLength (257) SHORT (3) 1<10>
BitsPerSample (258) SHORT (3) 1<1>
Compression (259) SHORT (3) 1<1>
Photometric (262) SHORT (3) 1<0>
StripOffsets (273) LONG (4) 1<8>
Orientation (274) SHORT (3) 1<1>
SamplesPerPixel (277) SHORT (3) 1<1>
RowsPerStrip (278) SHORT (3) 1<64>
StripByteCounts (279) LONG (4) 1<30>
XResolution (282) RATIONAL (5) 1<376.193>
YResolution (283) RATIONAL (5) 1<376.193>
PlanarConfig (284) SHORT (3) 1<1>
ResolutionUnit (296) SHORT (3) 1<2>
Gut, tiffdump war auf meiner Seite. Was ist also der Grund für diese Diskrepanz? Schauen wir zuerst in die TIFF-6.0 Spezifikation, dort steht auf Seite 41:
For black and white TIFF files that represent shades of gray, the technique used to
convert from gray to black and white pixels.
Tag = 263 (107.H)
Type = SHORT
N = 1
1 = No dithering or halftoning has been applied to the image data.
2 = An ordered dither or halftone technique has been applied to the image data.
3 = A randomized process such as error diffusion has been applied to the image data.
Default is Threshholding = 1. See also CellWidth, CellLength.
Okay. Für das oben benutzte TIFF trifft zu, dass es schwarz-weiß ist und kein Tag 263 enthält. Daher wird der Default = 1 angenommen.

Jhove präsentiert die Metadaten der TIFF-Dateien also so, wie ein TIFF-Reader sie interpretieren würde. Die Tools checkit_tiff und tiffdump zeigen dagegen, welche TIFF-Tags mit welchen Werten tatsächlich in den TIFF-Dateien explizit kodiert sind.

Fazit

Kenne Deine Tools! Statt Default-Werte zu interpretieren, sollten solche Annahmen explizit gekennzeichnet werden. Für den Durchschnittsanwender ist sonst nicht ersichtlich, wie die Ergebnisse zustande kommen. Als Lektion für checkit_tiff nehme ich diese Frage mit in die FAQ auf.






First thought: WTF?

Seriously? What's supposed to be so problematic about TIFF's defaults? After all, the Spezifikation says it all. The rules are:
  1. If a TIFF does not contain a tag that has a well-defined default value, then that default value is used.
  2. If a TIFF does contain a tag, then that tag's value is used.
  3. In all other cases, the value is undefined and hence nonexistent.

Der zweite Blick

Unfortunately, the real world is a little more complicated. I was asked why jhove would give a value of "1" for the Thresholding tag 263 when validating TIFF-examples that are delivered with checkit_tiff as shown below:
$> jhove tiffs_should_pass/minimal_valid_baseline.tiff
Jhove (Rel. 1.6, 2011-01-04)
 Date: 2018-04-16 12:41:25 MESZ
 RepresentationInformation: tiffs_should_pass/minimal_valid_baseline.tiff
  ReportingModule: TIFF-hul, Rel. 1.5 (2007-10-02)
  LastModified: 2017-07-14 11:28:57 MESZ
  Size: 323
  Format: TIFF
  Version: 5.0
  Status: Well-Formed and valid
  SignatureMatches:
   TIFF-hul
  MIMEtype: image/tiff
  Profile: Baseline bilevel (Class B), TIFF/IT-BP (ISO 12639:1998), TIFF/IT-BP/P1 (ISO 12639:1998), TIFF/IT-BP/P2 (ISO 12639:1998), TIFF/IT-MP (ISO 12639:1998)
  TIFFMetadata:
   ByteOrder: little-endian
   IFDs:
    Number: 1
    IFD:
     Offset: 38
     Type: TIFF
     Entries:
      NisoImageMetadata:
       ByteOrder: little_endian
       CompressionScheme: uncompressed
       ImageWidth: 20
       ImageHeight: 10
       ColorSpace: white is zero
       Orientation: normal
       SamplingFrequencyUnit: inch
       XSamplingFrequency: 376,193
       YSamplingFrequency: 376,193
       BitsPerSample: 1
       BitsPerSampleUnit: integer
       SamplesPerPixel: 1
      NewSubfileType: 0
      SampleFormat: 1
      MinSampleValue: 0
      MaxSampleValue: 1
      Threshholding: 1
      TIFFITProperties:
       BackgroundColorIndicator: background not defined
       ImageColorIndicator: image not defined
       TransparencyIndicator: no transparency
       PixelIntensityRange: 0, 1
       RasterPadding: 1 byte
       BitsPerRunLength: 8
       BitsPerExtendedRunLength: 16
However, checkit_tiff does not throw an error while validating the same sample file, even though there's no whitelist rule for that tag in the config file:
$> checkit_tiff example_configs/cit_tiff6_baseline_SLUB.cfg tiffs_should_pass/minimal_valid_baseline.tiff
'./build/checkit_tiff' version: development_v0.4.0
    revision: 408
licensed under conditions of libtiff (see http://libtiff.maptools.org/misc.html)
cfg_file=example_configs/cit_tiff6_baseline_SLUB.cfg
tiff file/dir=tiffs_should_pass/minimal_valid_baseline.tiff
file: tiffs_should_pass/minimal_valid_baseline.tiff
(./)    general    --> TIFF should have just one IFD, (lineno: 12)
(./)    general    --> All tag offsets should be word aligned, (lineno: 14)
(./)    general    --> All offsets may only be used once, (lineno: 14)
(./)    general    --> All tag offsets should be greater than zero, (lineno: 14)
(./)    general    --> All IFDs should be word aligned, (lineno: 15)
(./)    general    --> Tags should be sorted in ascending order, (lineno: 15)
(./)    tag 256 (ImageWidth)    --> Tag should have a value in a range of (lineno: 23)
(./)    tag 257 (ImageLength)    --> Tag should have a value in a range of (lineno: 25)
(./)    tag 258 (BitsPerSample)    --> One or more conditions needs to be combined in a logical_or operation (open) (lineno: 30)
(./)    tag 259 (Compression)    --> Tag should have one exact value. (lineno: 36)
(./)    tag 262 (Photometric)    --> Tag should have a value in a range of (lineno: 40)
(./)    tag 273 (StripOffsets)    --> TIFF should contain this tag. (lineno: 45)
(./)    tag 277 (SamplesPerPixel)    --> Tag should have one exact value. (lineno: 52)
(./)    tag 278 (RowsPerStrip)    --> Tag should have a value in a range of (lineno: 55)
(./)    tag 279 (StripByteCounts)    --> TIFF should contain this tag. (lineno: 60)
(./)    tag 282 (XResolution)    --> Tag should have a value in a range of (lineno: 63)
(./)    tag 283 (YResolution)    --> Tag should have a value in a range of (lineno: 66)
(./)    tag 296 (ResolutionUnit)    --> Tag should have one exact value. (lineno: 69)
(./)    tag 254 (SubFileType)    --> One or more conditions needs to be combined in a logical_or operation (open) (lineno: 77)
(./)    tag 274 (Orientation)    --> Tag should have one exact value. (lineno: 113)
(./)    tag 284 (PlanarConfig)    --> Tag should have one exact value. (lineno: 122)
(./)
(./)Yes, the given tif is valid :)
Being sure that checkit_tiff works as expected and that I had checked everything, I was shocked at first. To err on the side of safety, I ran a crosscheck of checkit_tiff's output with the output of the tiffdump tool from the libtiff:
$> tiffdump tiffs_should_pass/minimal_valid_baseline.tifftiffs_should_pass/minimal_valid_baseline.tiff:
Magic: 0x4949 <little-endian> Version: 0x2a <ClassicTIFF>
Directory 0: offset 38 (0x26) next 0 (0)
SubFileType (254) LONG (4) 1<0>
ImageWidth (256) SHORT (3) 1<20>
ImageLength (257) SHORT (3) 1<10>
BitsPerSample (258) SHORT (3) 1<1>
Compression (259) SHORT (3) 1<1>
Photometric (262) SHORT (3) 1<0>
StripOffsets (273) LONG (4) 1<8>
Orientation (274) SHORT (3) 1<1>
SamplesPerPixel (277) SHORT (3) 1<1>
RowsPerStrip (278) SHORT (3) 1<64>
StripByteCounts (279) LONG (4) 1<30>
XResolution (282) RATIONAL (5) 1<376.193>
YResolution (283) RATIONAL (5) 1<376.193>
PlanarConfig (284) SHORT (3) 1<1>
ResolutionUnit (296) SHORT (3) 1<2>
Well, tiffdump was in my team there. So, what's the reason for that discrepancy? First, let's have a loot at the TIFF-6.0 Spezifikation. On page 41, the specification states:
For black and white TIFF files that represent shades of gray, the technique used to
convert from gray to black and white pixels.
Tag = 263 (107.H)
Type = SHORT
N = 1
1 = No dithering or halftoning has been applied to the image data.
2 = An ordered dither or halftone technique has been applied to the image data.
3 = A randomized process such as error diffusion has been applied to the image data.
Default is Threshholding = 1. See also CellWidth, CellLength.
Okay. Looking at the sample TIFF we used above, it's true that it's a black-and-white image and does not contain tag 263. Hence, a default = 1 is assumed.

Apparently, Jhove will present the metadata in the TIF files in a way that a TIF reader would interpret them. The tools checkit_tiff and tiffdump however show which TIF tags are actually explicitely encoded in the TIFFs and what values they have.

Wrap-up

Know your tools!Instead of interpreting default values, these kinds of exceptions need to be cleary marked. Otherwise, the genesis of these results might not be apparent to the average user.
I have learned learned my lesson and will include this question into the checkit_tiff FAQ.