Introduction
TIFF in general is a very simple fileformat.
It starts with a constant header entry, which indicates that the file is
a TIFF and how it is encoded (byteorder).
The header contains an offset entry which
points to the first image file directory (IFD). Each IFD has a field
which counts the number of associated tags, followed by an array of
these tags and an offset entry to the next IFD or to zero, which means
there is no further IFD.
Each tag in the array is 12 Bytes long. The
first 4 bytes indicate the tag itself, the next 2 bytes declare the
value-type, followed by 2 bytes counting the values. The last 4 bytes
are either an offset or hold the values themselves.
What makes a TIFF robust?
In the TIFF specification, there are some hints which help us to repair broken TIFFs.
The first hint is that all offset-addresses must be even.
The second important rule is that the tags in an IFD must be sorted in an ascending order.
At last, the TIFF spec defines different areas
in the tag range. This is to guarantee that the important values are
well defined.
If we guarantee that a valid TIFF was stored, there is a good chance to detect and repair broken TIFFs using these three hints.
What are the caveats of TIFF?
As a proof of concept there is also a tool
"checkit_tiff_risks" provided in this repository. Using this tool, users
can analyze the layout of any baseline TIF file.
The most risky memory ranges are the offsets.
If a bitflip occurs there, the user must search the complete 4GB range.
In practise, the TIF files are smaller, and so this size is the
searchspace for offsets.
The most risky offsets are the ones which are indirect offsets. This means the IFD0 offset and the StripOffset tag (code 273).
Here an example of a possible complex StripOffset encoding:
The problem in this example is that TIFF has
no way to find out how many bytes are part of the pixel-data stream. The
existing StripByteCounts tag only stores the expected pixel data length
after decompression.
This makes the StripOffset tag very fragile.
If a bitflip changes the offset of the StripOffset tag, the whole pixel
information might be lost.
Also, if a bitflip occurs in the offset area
that the StripOffset tag points to, the partial pixel data of the
affected stripe is lost.
If compression is used, the risk of losing the
whole picture is even higher, because the compression methods do not
use an end-symbol. Instead, the buffer sizes as stored in the
StripByteCount tag are used.
Therefore, a bit-error in the Compression tag, the StripOffset tag, the
StripByteCount tag or in the memory-map where StripOffset points to,
could destroy the picture information.
In upcoming versions of checkit_tiff, we would provide a tool to analyze the distribution of risky offsets in given TIFF-files. This will objectify the discussion about robust fileformats vs. compression.
Here a short preview:
$> ./checkit_tiff_risk ../tiffs_should_pass/minimal_valid.tiff
This reports this kind of statistics:
[00], type= unused/unknown, bytes= 0, ratio=0.00000
[01], type= constant, bytes= 4, ratio=0.01238
[02], type= ifd, bytes= 130, ratio=0.40248
[03], type= offset_to_ifd0, bytes= 4, ratio=0.01238
[04], type= offset_to_ifd, bytes= 4, ratio=0.01238
[05], type= ifd_embedded_standardized_value, bytes= 52, ratio=0.16099
[06], type= ifd_embedded_registered_value, bytes= 0, ratio=0.00000
[07], type= ifd_embedded_private_value, bytes= 0, ratio=0.00000
[08], type=ifd_offset_to_standardized_value, bytes= 12, ratio=0.03715
[09], type= ifd_offset_to_registered_value, bytes= 0, ratio=0.00000
[10], type= ifd_offset_to_private_value, bytes= 0, ratio=0.00000
[11], type= ifd_offset_to_stripoffsets, bytes= 0, ratio=0.00000
[12], type= stripoffset_value, bytes= 30, ratio=0.09288
[13], type= standardized_value, bytes= 87, ratio=0.26935
[14], type= registered_value, bytes= 0, ratio=0.00000
[15], type= private_value, bytes= 0, ratio=0.00000
counted: 323 bytes, size: 323 bytes
In this example the StripOffset is encoded directly (there are only one stripe). The problematic bytes are the offset-addresses (affected 20 Bytes of 323 Bytes).
In opposite to this example, here a special file using multiple strips:
$> ./checkit_tiff_risk ../tiffs_should_pass/minimal_valid_multiple_stripoffsets.tiff
This reports this kind of statistics:
[00], type= unused/unknown, bytes= 0, ratio=0.00000
[01], type= constant, bytes= 4, ratio=0.01250
[02], type= ifd, bytes= 122, ratio=0.38125
[03], type= offset_to_ifd0, bytes= 4, ratio=0.01250
[04], type= offset_to_ifd, bytes= 4, ratio=0.01250
[05], type= ifd_embedded_standardized_value, bytes= 44, ratio=0.13750
[06], type= ifd_embedded_registered_value, bytes= 0, ratio=0.00000
[07], type= ifd_embedded_private_value, bytes= 0, ratio=0.00000
[08], type=ifd_offset_to_standardized_value, bytes= 16, ratio=0.05000
[09], type= ifd_offset_to_registered_value, bytes= 0, ratio=0.00000
[10], type= ifd_offset_to_private_value, bytes= 0, ratio=0.00000
[11], type= ifd_offset_to_stripoffsets, bytes= 40, ratio=0.12500
[12], type= stripoffset_value, bytes= 30, ratio=0.09375
[13], type= standardized_value, bytes= 56, ratio=0.17500
[14], type= registered_value, bytes= 0, ratio=0.00000
[15], type= private_value, bytes= 0, ratio=0.00000
counted: 320 bytes, size: 320 bytes
Here you can see we have the type 11, which points StripOffset to an array of offset adresses, where the pixel data could be found. This is similar to the diagram above. In this case we have 40 bytes with high bitflipping risk.
Upcoming next…
In upcoming versions of checkit_tiff, we would provide a tool to analyze the distribution of risky offsets in given TIFF-files. This will objectify the discussion about robust fileformats vs. compression.
Here a short preview:
$> ./checkit_tiff_risk ../tiffs_should_pass/minimal_valid.tiff
This reports this kind of statistics:
[00], type= unused/unknown, bytes= 0, ratio=0.00000
[01], type= constant, bytes= 4, ratio=0.01238
[02], type= ifd, bytes= 130, ratio=0.40248
[03], type= offset_to_ifd0, bytes= 4, ratio=0.01238
[04], type= offset_to_ifd, bytes= 4, ratio=0.01238
[05], type= ifd_embedded_standardized_value, bytes= 52, ratio=0.16099
[06], type= ifd_embedded_registered_value, bytes= 0, ratio=0.00000
[07], type= ifd_embedded_private_value, bytes= 0, ratio=0.00000
[08], type=ifd_offset_to_standardized_value, bytes= 12, ratio=0.03715
[09], type= ifd_offset_to_registered_value, bytes= 0, ratio=0.00000
[10], type= ifd_offset_to_private_value, bytes= 0, ratio=0.00000
[11], type= ifd_offset_to_stripoffsets, bytes= 0, ratio=0.00000
[12], type= stripoffset_value, bytes= 30, ratio=0.09288
[13], type= standardized_value, bytes= 87, ratio=0.26935
[14], type= registered_value, bytes= 0, ratio=0.00000
[15], type= private_value, bytes= 0, ratio=0.00000
counted: 323 bytes, size: 323 bytes
In this example the StripOffset is encoded directly (there are only one stripe). The problematic bytes are the offset-addresses (affected 20 Bytes of 323 Bytes).
In opposite to this example, here a special file using multiple strips:
$> ./checkit_tiff_risk ../tiffs_should_pass/minimal_valid_multiple_stripoffsets.tiff
This reports this kind of statistics:
[00], type= unused/unknown, bytes= 0, ratio=0.00000
[01], type= constant, bytes= 4, ratio=0.01250
[02], type= ifd, bytes= 122, ratio=0.38125
[03], type= offset_to_ifd0, bytes= 4, ratio=0.01250
[04], type= offset_to_ifd, bytes= 4, ratio=0.01250
[05], type= ifd_embedded_standardized_value, bytes= 44, ratio=0.13750
[06], type= ifd_embedded_registered_value, bytes= 0, ratio=0.00000
[07], type= ifd_embedded_private_value, bytes= 0, ratio=0.00000
[08], type=ifd_offset_to_standardized_value, bytes= 16, ratio=0.05000
[09], type= ifd_offset_to_registered_value, bytes= 0, ratio=0.00000
[10], type= ifd_offset_to_private_value, bytes= 0, ratio=0.00000
[11], type= ifd_offset_to_stripoffsets, bytes= 40, ratio=0.12500
[12], type= stripoffset_value, bytes= 30, ratio=0.09375
[13], type= standardized_value, bytes= 56, ratio=0.17500
[14], type= registered_value, bytes= 0, ratio=0.00000
[15], type= private_value, bytes= 0, ratio=0.00000
counted: 320 bytes, size: 320 bytes
Here you can see we have the type 11, which points StripOffset to an array of offset adresses, where the pixel data could be found. This is similar to the diagram above. In this case we have 40 bytes with high bitflipping risk.