Freitag, 3. Dezember 2021

Detectorist - Part two "A crumb of knowledge"

A crumb of knowledge


In the first part I described how I came to know how to read the floppy disks (using kryoflux). Now I would like to give an intermediate state about the floppy disk format of the Panasonic typewriter - in the quiet hope that someone could uncover the last secret.


I found the most important clue while researching a successor model - the Panasonc KX-W1000. I stumbled across the follow old blog post https://surrey.lug.org.uk/panasonic-kx-w1000.

My findings

Even if it didn't lead to a full success, there were some interesting insights. The floppy image is strongly related to FAT12.

Here is my summary.

The filesystem is based on FAT12 with proprietary extensions. 

Header / MBR

The first bytes are: 0x00 00 00 4B 58 2D 57 31 35 31 30 20 31 2E 30 30 20, which corresponds to the string "KX-W1510 1.00" from the third byte onwards.

The first 256 bytes are very similar to a MBR of old DOS floppies:

0000:0000 | 00 00 00 4B  58 2D 57 31  35 31 30 20  31 2E 30 30 | ...KX-W1510 1.00
0000:0010 | 20 F9 00 00  00 00 00 00  00 00 00 00  00 00 00 00 |  ù..............
0000:0020 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:0030 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:0040 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:0050 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:0060 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:0070 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:0080 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:0090 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:00A0 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:00B0 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:00C0 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:00D0 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:00E0 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................
0000:00F0 | 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00 | ................

FATs

There are two equal blocks which probably represent FATs, once at address 0x200:

0000:0200 | F9 FF FF 03  40 00 05 B0  00 07 80 00  09 A0 00 FF | ùÿÿ.@..°..... .ÿ
0000:0210 | FF FF 0D E0  00 0F 00 01  FF 8F 01 13  40 01 15 60 | ÿÿ.à....ÿ...@..`
0000:0220 | 01 17 F0 FF  19 90 02 1B  10 02 1D E0  01 1F 00 02 | ..ðÿ.......à....
0000:0230 | FF 2F 02 23  F0 FF 25 60  02 2D 80 02  2C A0 02 2B | ÿ/.#ðÿ%`.-.., .+
0000:0240 | F0 FF FF EF  02 35 00 03  31 20 03 33  F0 FF 36 F0 | ðÿÿï.5..1 .3ðÿ6ð
0000:0250 | FF 37 80 03  39 F0 FF 3B  C0 03 3D E0  03 FF 0F 04 | ÿ7..9ðÿ;À.=à.ÿ..
0000:0260 | 41 20 04 43  F0 FF 45 60  04 47 80 04  49 F0 FF 4B | A .CðÿE`.G..IðÿK
0000:0270 | C0 04 4D E0  04 4F F0 FF  51 20 05 53  40 05 55 F0 | À.Mà.OðÿQ .S@.Uð
0000:0280 | FF 57 80 05  59 A0 05 5B  F0 FF 5D E0  05 69 B0 07 | ÿW..Y .[ðÿ]à.i°.
0000:0290 | 7F 20 06 63  40 06 65 F0  FF 6E 80 06  6B A0 06 FF | . .c@.eðÿn..k .ÿ
0000:02A0 | CF 06 6D F0  FF 7E 00 07  71 20 07 73  40 07 FF 6F | Ï.mðÿ~..q .s@.ÿo
0000:02B0 | 07 77 80 07  79 A0 07 FF  CF 07 7D 60  08 80 30 08 | .w..y .ÿÏ.}`..0.
0000:02C0 | 81 20 08 FF  4F 08 85 80  08 87 F0 FF  FF AF 08 8B | . .ÿO.....ðÿÿ¯..
0000:02D0 | F0 08 8D E0  08 90 20 09  91 40 09 93  F0 FF FF 6F | ð..à.. ..@..ðÿÿo
0000:02E0 | 09 A2 F0 09  99 A0 09 9B  C0 09 9D F0  FF A1 00 0A | .¢ð.. ..À..ðÿ¡..
0000:02F0 | B2 60 0A A3  40 0A A5 F0  FF AC 80 0A  A9 A0 0A AB | ²`.£@.¥ðÿ¬..© .«
0000:0300 | F0 FF AD E0  0A AF 00 0B  B1 B0 0B BA  40 0B B5 C0 | ðÿ.à.¯..±°.º@.µÀ
0000:0310 | 0C B7 80 0B  B9 60 0C C3  C0 0B BD E0  0B BF 00 0C | .·..¹`.ÃÀ.½à.¿..
0000:0320 | C1 20 0C CA  40 0C C5 80  0C C7 90 0C  E8 00 0D CB | Á .Ê@.Å..Ç..è..Ë
0000:0330 | F0 FF CD E0  0C CF B0 0D  D1 50 0D D3  40 0D DE F0 | ðÿÍà.Ï°.ÑP.Ó@.Þð
0000:0340 | FF D7 E0 0E  D9 70 0E E2  C0 0D DD F0  FF DF 00 0E | ÿ×à.Ùp.âÀ.Ýðÿß..
0000:0350 | E1 50 0E E3  40 0E E6 90  0E FB C0 0E  FF AF 0E EB | áP.ã@.æ..ûÀ.ÿ¯.ë
0000:0360 | F0 FF ED 60  0F EF 00 0F  F1 20 0F F3  40 0F F5 F0 | ðÿí`.ï..ñ .ó@.õð
0000:0370 | FF F7 80 0F  F9 A0 0F FF  CF 0F FD E0  0F 08 01 00 | ÿ÷..ù .ÿÏ.ýà....

once at 0x800:

0000:0800 | F9 FF FF 03  40 00 05 B0  00 07 80 00  09 A0 00 FF | ùÿÿ.@..°..... .ÿ
0000:0810 | FF FF 0D E0  00 0F 00 01  FF 8F 01 13  40 01 15 60 | ÿÿ.à....ÿ...@..`
0000:0820 | 01 17 F0 FF  19 90 02 1B  10 02 1D E0  01 1F 00 02 | ..ðÿ.......à....
0000:0830 | FF 2F 02 23  F0 FF 25 60  02 2D 80 02  2C A0 02 2B | ÿ/.#ðÿ%`.-.., .+
0000:0840 | F0 FF FF EF  02 35 00 03  31 20 03 33  F0 FF 36 F0 | ðÿÿï.5..1 .3ðÿ6ð
0000:0850 | FF 37 80 03  39 F0 FF 3B  C0 03 3D E0  03 FF 0F 04 | ÿ7..9ðÿ;À.=à.ÿ..
0000:0860 | 41 20 04 43  F0 FF 45 60  04 47 80 04  49 F0 FF 4B | A .CðÿE`.G..IðÿK
0000:0870 | C0 04 4D E0  04 4F F0 FF  51 20 05 53  40 05 55 F0 | À.Mà.OðÿQ .S@.Uð
0000:0880 | FF 57 80 05  59 A0 05 5B  F0 FF 5D E0  05 69 B0 07 | ÿW..Y .[ðÿ]à.i°.
0000:0890 | 7F 20 06 63  40 06 65 F0  FF 6E 80 06  6B A0 06 FF | . .c@.eðÿn..k .ÿ
0000:08A0 | CF 06 6D F0  FF 7E 00 07  71 20 07 73  40 07 FF 6F | Ï.mðÿ~..q .s@.ÿo
0000:08B0 | 07 77 80 07  79 A0 07 FF  CF 07 7D 60  08 80 30 08 | .w..y .ÿÏ.}`..0.
0000:08C0 | 81 20 08 FF  4F 08 85 80  08 87 F0 FF  FF AF 08 8B | . .ÿO.....ðÿÿ¯..
0000:08D0 | F0 08 8D E0  08 90 20 09  91 40 09 93  F0 FF FF 6F | ð..à.. ..@..ðÿÿo
0000:08E0 | 09 A2 F0 09  99 A0 09 9B  C0 09 9D F0  FF A1 00 0A | .¢ð.. ..À..ðÿ¡..
0000:08F0 | B2 60 0A A3  40 0A A5 F0  FF AC 80 0A  A9 A0 0A AB | ²`.£@.¥ðÿ¬..© .«
0000:0900 | F0 FF AD E0  0A AF 00 0B  B1 B0 0B BA  40 0B B5 C0 | ðÿ.à.¯..±°.º@.µÀ
0000:0910 | 0C B7 80 0B  B9 60 0C C3  C0 0B BD E0  0B BF 00 0C | .·..¹`.ÃÀ.½à.¿..
0000:0920 | C1 20 0C CA  40 0C C5 80  0C C7 90 0C  E8 00 0D CB | Á .Ê@.Å..Ç..è..Ë
0000:0930 | F0 FF CD E0  0C CF B0 0D  D1 50 0D D3  40 0D DE F0 | ðÿÍà.Ï°.ÑP.Ó@.Þð
0000:0940 | FF D7 E0 0E  D9 70 0E E2  C0 0D DD F0  FF DF 00 0E | ÿ×à.Ùp.âÀ.Ýðÿß..
0000:0950 | E1 50 0E E3  40 0E E6 90  0E FB C0 0E  FF AF 0E EB | áP.ã@.æ..ûÀ.ÿ¯.ë
0000:0960 | F0 FF ED 60  0F EF 00 0F  F1 20 0F F3  40 0F F5 F0 | ðÿí`.ï..ñ .ó@.õð
0000:0970 | FF F7 80 0F  F9 A0 0F FF  CF 0F FD E0  0F 08 01 00 | ÿ÷..ù .ÿÏ.ýà....
0000:0980 | 00 00 00 00  00 00 00 00  00 00 00 00  FF 0F 00 00 | ............ÿ... 

Directory

The main directory always starts from address 0xe00:

0000:0E00 | 20 20 20 20  20 20 44 49  5B 54 20 FF  00 00 00 00 |       DI[T ÿ....
0000:0E10 | 00 00 00 00  00 00 06 00  21 00 02 00  F5 13 00 00 | ........!...õ...
0000:0E20 | 20 20 20 20  20 20 41 46  46 45 20 FF  00 00 00 00 |       AFFE ÿ....
0000:0E30 | 00 00 00 00  00 00 06 00  21 00 06 00  F6 13 00 00 | ........!...ö...
0000:0E40 | 20 20 20 54  52 5D 46 46  45 4C 20 FF  00 00 00 00 |    TR]FFEL ÿ....
0000:0E50 | 00 00 00 00  00 00 06 00  21 00 0C 00  C4 13 00 00 | ........!...Ä...
0000:0E60 | 20 20 45 52  42 50 52 49  4E 5A 20 FF  00 00 00 00 |   ERBPRINZ ÿ....
0000:0E70 | 00 00 00 00  00 00 06 00  21 00 11 00  17 14 00 00 | ........!.......
0000:0E80 | 20 20 20 20  42 49 53 54  52 4F 20 FF  00 00 00 00 |     BISTRO ÿ....
0000:0E90 | 00 00 00 00  00 00 06 00  21 00 12 00  61 14 00 00 | ........!...a...
0000:0EA0 | 20 20 20 48  55 48 4E 20  49 49 20 FF  00 00 00 00 |    HUHN II ÿ....
0000:0EB0 | 00 00 00 00  00 00 06 00  21 00 1C 00  CC 13 00 00 | ........!...Ì...
0000:0EC0 | 20 20 20 20  57 41 43 48  41 55 20 FF  00 00 00 00 |     WACHAU ÿ....
0000:0ED0 | 00 00 00 00  00 00 06 00  21 00 1A 00  C0 13 00 00 | ........!...À...
0000:0EE0 | 20 20 20 20  20 4B 41 4B  41 4F 20 FF  00 00 00 00 |      KAKAO ÿ....
0000:0EF0 | 00 00 00 00  00 00 06 00  21 00 24 00  1D 14 00 00 | ........!.$.....
0000:0F00 | 20 20 20 20  20 20 4D 5D  4C 4C 20 FF  00 00 00 00 |       M]LL ÿ....
0000:0F10 | 00 00 00 00  00 00 06 00  21 00 27 00  22 0B 00 00 | ........!.'."...
0000:0F20 | 20 46 52 41  55 20 4D 4F  44 45 20 FF  00 00 00 00 |  FRAU MODE ÿ....
0000:0F30 | 00 00 00 00  00 00 06 00  21 00 2F 00  AC 13 00 00 | ........!./.¬...
0000:0F40 | 20 20 20 20  53 55 50 50  45 4E 20 FF  00 00 00 00 |     SUPPEN ÿ....
0000:0F50 | 00 00 00 00  00 00 06 00  21 00 34 00  C7 13 00 00 | ........!.4.Ç...
0000:0F60 | 55 4E 53 45  52 20 42 52  4F 54 20 FF  00 00 00 00 | UNSER BROT ÿ....
0000:0F70 | 00 00 00 00  00 00 06 00  21 00 3A 00  B8 13 00 00 | ........!.:.¸...
0000:0F80 | 20 20 20 20  20 20 31 39  39 34 20 FF  00 00 00 00 |       1994 ÿ....
0000:0F90 | 00 00 00 00  00 00 06 00  21 00 3F 00  AA 13 00 00 | ........!.?.ª...
0000:0FA0 | 20 20 20 20  20 4B 5D 43  48 45 20 FF  00 00 00 00 |      K]CHE ÿ....
0000:0FB0 | 00 00 00 00  00 00 06 00  21 00 44 00  3D 14 00 00 | ........!.D.=...
0000:0FC0 | 20 55 43 4B  45 52 4D 41  52 4B 20 FF  00 00 00 00 |  UCKERMARK ÿ....
0000:0FD0 | 00 00 00 00  00 00 06 00  21 00 4A 00  2B 14 00 00 | ........!.J.+...
0000:0FE0 | 20 20 52 49  45 53 4C 49  4E 47 20 FF  00 00 00 00 |   RIESLING ÿ....
0000:0FF0 | 00 00 00 00  00 00 06 00  21 00 50 00  31 14 00 00 | ........!.P.1...
0000:1000 | 43 48 49 4E  41 54 52 5D  46 46 20 FF  00 00 00 00 | CHINATR]FF ÿ....
0000:1010 | 00 00 00 00  00 00 06 00  21 00 56 00  25 14 00 00 | ........!.V.%...
0000:1020 | 20 4B 5B 53  45 52 45 53  54 45 20 FF  00 00 00 00 |  K[SERESTE ÿ....
0000:1030 | 00 00 00 00  00 00 06 00  21 00 5C 00  E3 12 00 00 | ........!.\.ã...
0000:1040 | 4B 41 54 5A  45 4E 46 55  54 54 20 FF  00 00 00 00 | KATZENFUTT ÿ....
0000:1050 | 00 00 00 00  00 00 06 00  21 00 61 00  CC 12 00 00 | ........!.a.Ì...
0000:1060 | 20 20 52 4F  42 55 43 48  4F 4E 20 FF  00 00 00 00 |   ROBUCHON ÿ....
0000:1070 | 00 00 00 00  00 00 06 00  21 00 5F 00  55 14 00 00 | ........!._.U...
0000:1080 | 20 20 20 4D  41 4E 41 47  45 52 20 FF  00 00 00 00 |    MANAGER ÿ....
0000:1090 | 00 00 00 00  00 00 06 00  21 00 67 00  FC 13 00 00 | ........!.g.ü...
0000:10A0 | 20 20 4D 49  43 48 45 4C  49 4E 20 FF  00 00 00 00 |   MICHELIN ÿ....
0000:10B0 | 00 00 00 00  00 00 06 00  21 00 6F 00  8C 14 00 00 | ........!.o.....
0000:10C0 | 20 20 50 49  4D 45 4E 54  4F 53 20 FF  00 00 00 00 |   PIMENTOS ÿ....
0000:10D0 | 00 00 00 00  00 00 06 00  21 00 75 00  14 14 00 00 | ........!.u.....
0000:10E0 | 54 48 4F 4D  41 53 4D 41  4E 4E 20 FF  00 00 00 00 | THOMASMANN ÿ....
0000:10F0 | 00 00 00 00  00 00 06 00  21 00 66 00  20 14 00 00 | ........!.f. ...
0000:1100 | 20 20 38 2D  4D 41 49 2D  34 35 20 FF  00 00 00 00 |   8-MAI-45 ÿ....
0000:1110 | 00 00 00 00  00 00 06 00  21 00 60 00  2A 14 00 00 | ........!.`.*...
0000:1120 | 20 20 43 4F  51 41 55 56  49 4E 20 FF  00 00 00 00 |   COQAUVIN ÿ....
0000:1130 | 00 00 00 00  00 00 06 00  21 00 89 00  0B 14 00 00 | ........!.......
0000:1140 | 20 47 55 44  45 20 53 54  55 42 20 FF  00 00 00 00 |  GUDE STUB ÿ....
0000:1150 | 00 00 00 00  00 00 06 00  21 00 8C 00  A0 14 00 00 | ........!... ...
0000:1160 | 20 20 4D 4F  4E 54 43 41  55 44 20 FF  00 00 00 00 |   MONTCAUD ÿ....
0000:1170 | 00 00 00 00  00 00 06 00  21 00 95 00  63 15 00 00 | ........!...c...
0000:1180 | 20 53 50 41  52 47 45 4C  45 49 20 FF  00 00 00 00 |  SPARGELEI ÿ....
0000:1190 | 00 00 00 00  00 00 06 00  21 00 98 00  BD 14 00 00 | ........!...½...
0000:11A0 | 53 45 4D 49  42 45 4C 47  49 45 20 FF  00 00 00 00 | SEMIBELGIE ÿ....
0000:11B0 | 00 00 00 00  00 00 06 00  21 00 97 00  B3 25 00 00 | ........!...³%..
0000:11C0 | 20 53 45 4D  49 4E 41 52  39 35 20 FF  00 00 00 00 |  SEMINAR95 ÿ....
0000:11D0 | 00 00 00 00  00 00 06 00  21 00 9E 00  C9 4B 00 00 | ........!...ÉK..
0000:11E0 | 20 20 54 41  4E 54 41 4C  55 53 20 FF  00 00 00 00 |   TANTALUS ÿ....
0000:11F0 | 00 00 00 00  00 00 06 00  21 00 A7 00  DC 12 00 00 | ........!.§.Ü...

In contrast to FAT12 each directory entry consists of 10bytes for the file name, left padded with Spaces. Umlauts in filenames are possible (see below). A filename suffix does not exist. This corresponds with the findings in the typewriter manual.

Sometimes there is a special directory at Offset 0x100, this could hold the adress-lists or dictionaries:

0000:0100 | 20 20 20 20  57 41 53 53  45 52 20 FF  00 00 00 00 |     WASSER ÿ....
0000:0110 | 00 00 00 00  00 00 06 00  21 00 48 00  36 0A 00 00 | ........!.H.6...
0000:0120 | 20 20 20 20  20 4B 5D 43  48 45 20 FF  00 00 00 00 |      K]CHE ÿ....
0000:0130 | 00 00 00 00  00 00 06 00  21 00 49 00  3D 14 00 00 | ........!.I.=...
0000:0140 | 20 20 20 41  55 53 54 45  52 4E 20 FF  00 00 00 00 |    AUSTERN ÿ....
0000:0150 | 00 00 00 00  00 00 06 00  21 00 4E 00  D2 0A 00 00 | ........!.N.Ò...
0000:0160 | 20 20 20 20  54 52 5B 55  4D 45 20 FF  00 00 00 00 |     TR[UME ÿ....
0000:0170 | 00 00 00 00  00 00 06 00  21 00 50 00  59 25 00 00 | ........!.P.Y%..
0000:0180 | 20 20 52 45  43 48 4E 55  4E 47 20 FF  00 00 00 00 |   RECHNUNG ÿ....
0000:0190 | 00 00 00 00  00 00 06 00  21 00 54 00  C3 08 00 00 | ........!.T.Ã...
0000:01A0 | 20 20 20 20  20 48 45 4E  52 59 20 FF  00 00 00 00 |      HENRY ÿ....
0000:01B0 | 00 00 00 00  00 00 06 00  21 00 59 00  66 11 00 00 | ........!.Y.f...
0000:01C0 | 53 43 48 57  41 52 5A 41  44 4C 20 FF  00 00 00 00 | SCHWARZADL ÿ....
0000:01D0 | 00 00 00 00  00 00 06 00  21 00 57 00  94 0A 00 00 | ........!.W.....
0000:01E0 | 20 50 4C 41  43 48 55 54  54 41 20 FF  00 00 00 00 |  PLACHUTTA ÿ....
0000:01F0 | 00 00 00 00  00 00 06 00  21 00 5C 00  49 09 00 00 | ........!.\.I...

But sometimes there are textfragments (from other floppy):

0000:0100 | 64 20 73 63  68 E9 64 6C  69 63 68 21  C9 20 20 20 | d schédlich!É   
0000:0110 | 20 20 20 20  20 20 20 20  20 20 20 20  20 20 20 20 |                 
0000:0120 | 20 20 20 20  20 20 20 20  20 20 20 20  20 20 20 20 |                 
0000:0130 | 20 20 20 20  20 20 20 20  20 20 20 20  20 20 20 20 |                 
0000:0140 | 55 64 6F 20  50 6F 6C 6C  6D 65 72 2C  20 65 69 6E | Udo Pollmer, ein
0000:0150 | 20 4C 65 62  65 6E 73 6D  69 74 74 65  6C 63 68 65 |  Lebensmittelche
0000:0160 | 6D 69 6B 65  72 20 75 6E  64 20 65 72  66 6F 6C 67 | miker und erfolg
0000:0170 | 72 65 69 63  68 65 72 20  20 20 20 20  20 20 20 20 | reicher         
0000:0180 | 20 20 20 20  20 20 20 20  20 20 20 20  20 20 20 20 |                 
0000:0190 | 46 61 63 68  62 75 63 68  61 75 74 6F  72 20 68 61 | Fachbuchautor ha
0000:01A0 | 74 20 69 6E  20 65 69 6E  65 6D 20 5A  65 69 74 75 | t in einem Zeitu
0000:01B0 | 6E 67 73 69  6E 74 65 72  76 69 65 77  20 65 72 6B | ngsinterview erk
0000:01C0 | 6C E9 72 74  3A 20 20 20  20 20 20 20  20 20 20 20 | lért:           
0000:01D0 | 20 20 20 20  20 20 20 20  20 20 20 20  20 20 20 20 |                 
0000:01E0 | 22 44 69 E9  74 65 6E 20  6D 61 63 68  65 6E 20 64 | "Diéten machen d
0000:01F0 | 69 63 6B 22  2E 20 57 65  69 6C 20 64  65 72 20 4B | ick". Weil der K

Umlauts ans Special chars

Umlauts and Special chars are mapped as follows:

ä → 0x7b
ö → 0x7c
ü → 0x7d
Ä → 0x5b
Ö → 0x5c
Ü → 0x5d
ß → 0x85
hyphen → 0xbc

Open Questions

What is still completely unclear is how the FATs are constructed. They do look like FAT12 entries, the first bytes 0xf9 0xff 0x03... and the frequently occurring 0xff suggest this, yet there seems to be no connection between the addresses of the text fragments in the image and the FAT byte sequences.


In the directory entries everything points to the fact that byte 26 indicates the start cluster and bytes 28-29 the file size, the connection with the FAT and the actual offset (or cluster) to the data I could not decipher yet.

The meaning of offset 0x100 is unclear. 

If you have any ideas how to read the FATs, or how to interpret the bytes 26, 28-29 of the directory entries, or what the cluster size should be, feel free to write me.

If you are the owner of such an old typewriter, it would be helpful to have a clean-room floppy copy, i.e. a freshly formatted floppy with a small test text, so that I can reverse engineer the data format even better.

Just contact me at art1piratatgoogledotcom 

 

Supportive Links 

https://archive.org/details/MSXTechnicalDataBook/page/n269/mode/2up

https://github.com/Konamiman/MSX2-Technical-Handbook/blob/master/md/Chapter3.md#3--structure-of-disk-files

https://manualsbrain.com/ja/products/panasonic-kx-w1510/

Thanks

my thanks goes to 

Sonntag, 28. November 2021

Detectorist - Part one "First indications"

First indications

In an estate there are several CDROMs, DVDs and especially floppy disks. We were able to read most of them with Linux, including the floppy disks. Only on the last 8 floppy disks did we have a hard time. 

On one of the floppy disks a small inscription peeked out, referring to a Panasonic electronic typewriter. There was none in the estate, no other information was available. 

Eight floppy disks, 3,5", double density, not readable. 

Time passed, constantly haunted by the voice in my mind: "There's something on the disks, only what?" 

We managed to purchase a Kryoflux controller (see https://kryoflux.com/, there are also free opensource alternatives). This is a special disk controller that allows you to record the magnetic flux as the read heads move over the disk.

 

After the first attempts, I was able to create an image file with the following command:

./dtc -fIMAGEFILE -dd1 -g2 -i4

The options mean:

  • "-dd1" - double density
  • "-g2" - double sided
  • "-i4" - MFM sector image 40/80+ tracks

 

A look at the image using the hex editor showed that I was right with my intention. After the first three zero bytes, the string "KX-W1510 v1.00" followed (and to my happy surprise, a lot of readable text fragments). 

Yep, there is exactly one electronic typewriter series from Panasonic.

Disillusionment

I was able to find a manual at https://manualsbrain.com/en/manuals/1814281/. And yes, the machine used 3.5" floppy disks, double sided, double density with a capacity of 713,000 characters, but unfortunately without an exact description of the disk format and the file system.

I then contacted Panasonic support - no success. I started researching patent databases in Japan, the USA and Germany - nothing. I wrote to the Panasonic museum in Japan, but unfortunately they could not help me.


A proprietary disk format, which was forgotten after 30 years.


In the next part I report what I could find out about the disk system of the Panasonic typewriter KX-W1510, and where I (still) fail...


Donnerstag, 1. April 2021

Backup is digital long-term preservation!

Exponential growth

https://www.statista.com/chart/17727/global-data-creation-forecasts/

An important observation is that the number of files produced each year continues to increase worldwide (see https://en.wikipedia.org/wiki/Information_explosion). And with it the number of digital objects increases in the same measure, for which we must decide: Keep or throw away? 

The truth is, the discard scenario becomes the more likely one with each passing year.

Magnificent diversity

 Another observation is that about 90 new file formats are added every year.
And the file formats that are being dropped are already in place. 

 

The truth is, no one can build up format knowledge for this yet.

 

A fuzzy concept

When talking to colleagues, the topic of validation does not play a role. For one thing, no one is clear about what "valid" means. Valid against a specification? Valid against a profile? Valid because it can be opened by programs? On the other hand, nothing happens after that. If a file is broken, it is still archived. If it is not broken, fine. 

The truth is, validation is useless.

 

Success factors

Do you know how the success of digital preservation is measured? I'll tell you, in terabytes per year. If the numbers go up, that's a good thing to sell to politicians. Whether it was difficult to prepare digital objects for long-term availability doesn't matter. Whether born-digitals are more at risk, never mind. 

Is that the truth?

Overrated

It used to be said that long-term digital archiving could only be handled by organizations with a minimum of resources. Look around and you'll find dozens of one-man orchestras and part-time archives. And do you think that as the amount of data increases, so do the human resources? Oh, come on! 

 You know the truth!

That's too exhausting

If you've ever heard of format migration as a principle of long-term preservation, you've read in textbooks phrases like 

To ensure format migration, the significant properties of groups of objects that must be preserved must be determined. 

Have you ever seen an archive that has actually determined and documented significant properties

The truth is, significant properties are determined after the fact from technical metadata.

Summary

So what is digital long-term preservation? Only an expensive backup.

Mittwoch, 27. Januar 2021

Impossible - or how I learned to read data storage media at the speed of light and what it's good for


When I receive data carriers from an inheritance, I want to get a quick overview of what is on the floppy disk, the CDROM, the USB stick or the hard disk drive so that I can look at the interesting things first.

But I only know what is there when I read the media, right? A typical chicken and egg problem. 

https://openclipart.org/detail/212857/sci-fi-scanner-device
I discovered the crucial clue to the solution in a 2014 talk by Simon Garfinkel "Digital Forensics Innovation: Searching A Terabyte of Data in 10 minutes" (http://simson.net/ref/2014/2014-02-21_RPI_Forensics_Innovation.pdf)

What is Random Sampling?

Random sampling is nothing more than looking at only every n-th part of a total set and inferring the big picture.

To find out what is on a medium, it would be sufficient to look at random blocks and determine for them, based on their byte structure, whether they fall into the categories "empty", "random", "text", "video" or "undef".

Exactly this approach is implemented in the Perl module File::FormatIdentification::RandomSampling, which can be found on CPAN under https://metacpan.org/pod/File::FormatIdentification::RandomSampling.

The category "empty" is dominated by sequences of zero bytes, in the category "random" the byte values are almost equally distributed, in the category "text" values for the characters "a-z" from the ASCII character set appear frequently, "video" contains frequent byte sequences resulting from the basic structure of MPEG. And under "undef" everything else is subsumed.

Example

The above Perl module contains the program crazy_fast_image_scan.pl. The following simple call:

perl -I lib bin/crazy_fast_image_scan.pl --percent=0.000001 --image=/dev/mapper/laptop--vg-home

provides the following output:

Scanning Image /dev/mapper/laptop--vg-home with size 728982618112, checking 1423 sectors
scanning [...]   
Estimate, that the image '/dev/mapper/laptop--vg-home'
has percent of following data types:
    44.6% random/encrypted/compressed
    35.6% undef
    11.0% empty
     5.4% video/audio
     3.5% text

The complete output is even more extensive. It is important to note that the examined partition was 668GB in size and was scanned in just 15s.

Limits

Importantly, the output provides only a rough estimate of what might be on the media. The choice of the sample size (here: via the --percentage parameter) determines the informative value of the estimate, as well as the duration until a result can be delivered.

More ideas

In the above module, I have implemented an experimental output of the MIME-Types potentially present on the media. This is not very stable yet and needs more work, but it can help to estimate even better whether the files on a disk are interesting enough to prioritize it. Here is an example output:

The next mimetype estimation is experimental and needs further work:
    87.9% unknown
     3.5% application/pdf
     1.1% video/quicktime
     0.8% image/gif
     0.8% text/java
     0.7% application/msword
     0.6% text/markdown
     0.6% application/vnd.openxmlformats-officedocument.wordprocessingml.document
     0.6% application/xml
     0.4% application/msaccess
     0.4% application/navimap
     0.4% application/rtf
     0.3% image/png
     0.2% application/arj
     0.1% application/vnd.ms-powerpoint
     0.1% text/html

The approach is to determine the MIME-Type of the files for a test corpus using other tools, determine typical bytegram values and pass the whole thing to a decision tree learner. If you are interested, you are welcome to contribute to the module. 

Happy scanning!

Montag, 10. August 2020

It is nonsense to consider significant properties only at file level

As it looks, most archives raise significant properties at the file level (by the way, they often mean technical properties, which is not the same. But this is a topic for another blog post). But this is insufficient and I will give two examples.

Example 1 - Retro-digitised material

If monographs are scanned, as we do in-house, in order to preserve the originals and make them accessible to users, images are created.If you look at these image files, you can determine the following significant characteristics


  • readable
  • accessible for OCR analysis
  • reproducible
  • maybe even true to color

 

These properties can then be used to define technical parameters that can be found in certain requirement profiles and can lead, for example, to the recommendation of the TIFF file format.

In the above consideration, the list of the significant property "the order of the scans should correspond to the original" (pagination) is missing. This property could be implemented by combining all scan pages into one file format, e.g. as BigTIFF or PDF/A. However, there may be good reasons not to include all pages in one file. What next? The remaining option is to add a file describing the structure of the digitized material in addition to the TIFF files. This can be a METS XML file, for example. METS is a good choice because it was created for this very purpose. Hmmm, is METS not a metadata format? And doesn't metadata belong outside of the payload? And isn't METS used by several archive information systems to map the AIPs? So can I not pack the structuring data into it?

Stop!

It is true, METS is a metadata format. And it is true that METS is often used to describe container structures in SIPs or AIPs. But we have to distinguish between metadata describing the IE (i.e. the payload) and metadata inherently belonging to the payload. This is not easy, but here the significant properties help us: If the METS is used, as in our example, to represent the significant property "pagination", then the METS is part of the IE, otherwise it is not.

Now you might be tempted to get sloppy and just put the "pagination" into the METS of the AIP. Is that a good idea? No. Because IE should be kept available and usable. The AIP should only contain the metadata necessary to ensure availability. But when a user later accesses the payload via DIP, he should have everything together, i.e.: an intellectual unit as it was actually intended. This is the principle of independence.

I admit that sounds abstract and difficult. But let us try an analogy. If I have loose pages where the order is important, then the order is important, whether the page is archived or not. For example, I tie them to a book or use other techniques. This is my intellectual unit that I want to archive. I put the whole thing in a box and write on it what is in it and what happened to the box or the content during archiving. This is then my AIP. If I want to hand over the contents of this box to someone later, they don't necessarily have to be interested in what happened to the box, they can take the contents and work with them and know exactly in which order the pages follow each other.

Example 2 - Web page


I would like to present a second example to illustrate another aspect. Let us assume that we are to archive a very specific web page, which for the sake of simplicity consists of an HTML document, CSV files and graphic files. If you look at the web page, there is always a link in the text between one of the CSV files and one graphic file. The assignment could be the visualization of an experiment. It is only important to the department that the values, the textual content and the assignment to the graphic are not lost. Together with the department we determined the significant properties and after a lot of effort we transferred the website (IE) into the long-term archive. After some time we found out that the graphic files were subject to format obsolescence and had to be migrated to a new format. We decide on the new image archive format PNG/A and migrate the old files.

But is this sufficient? No. The HTML document still contains the file name of the old format. Should we change the file name or leave it as it is? The principle of least surprise speaks for "change". But if we change the file names during the migration, we impossibly have to change the file names in the HTML document as well.

Let's summarize

  1. Significant properties belong at the level of IE recorded. They are not file dependent.
  2. Metadata, which is essential to represent the relationship of objects within an IE, is mandatory part of an IE
  3. Format migrations can result in changes to other parts of the IE, even if they are not migrated themselves
  4. Metadata and data that are inside an IE must never refer to data or metadata outside
  5. Metadata outside of an IE, however, may already reference metadata and data of an IE.  


Whew, that was a lot of thinking, but I hope it was worth thinking about it.

Mittwoch, 22. Juli 2020

Format recognition, new analysis options?

Previous work


In an older article (see https://kulturreste.blogspot.com/2018/10/heres-tool-make-it-work.html) I have already done an analysis of PRONOM signatures. Since today the module for this exists on CPAN, see https://metacpan.org/pod/File::FormatIdentification::Pronom for details.

In addition to the statistics on PRONOM signatures, the Perl package comes with two more helper scripts that can make the work of a long-term archivist easier.

Format identification


On the one hand, we have the functionality of classic format recognition. The script delivers all hits. In the output the quality of the RegEx is indicated. This does not say how well the PRONOM signature matches the file, but how specifically it is created.

Here is an example output for a TIFF file, which was wrongly recognized as GeoTIFF by Droid:

perl -I lib bin/pronomidentify.pl -s DROID_SignatureFile_V96.xml -b /tmp/00000007.tif
/tmp/00000007.tif identified as Tagged Image File Format with PUID fmt/353 (regex quality 1)
/tmp/00000007.tif identified as Geographic Tagged Image File Format (GeoTIFF) with PUID fmt/155 (regex quality 2)


Colorized output of possible signature hits in the hexeditor wxHexEditor


Under Linux you can use the editor wxHexEditor to analyze files. It allows you to create tag-files, in which you can define sections that are marked with colors and annotated.

The script pronom2wxhexeditor creates such a file. In the following you can see the call and a screenshot.

perl -I lib bin/pronom2wxhexeditor.pl -s DROID_SignatureFile_V96.xml -b /tmp/00000007.tif


What next?


Well, it's up to us as a community to use the existing tools and use their possibilities to improve our daily work. Anyone who has suggestions for improvement or ideas is welcome to share them with us.

I would be especially happy if servant spirits would take the pronoun statistics to their chest and help improve the pronoun signatures.

It makes sense to start with the orphaned signatures and to check multiple used signatures again.

Montag, 13. Juli 2020

Why it is a stupid idea to consider CSV as a valid long-term preservation file format

Take CSV!

It's so nice and quick and easy to say. Take CSV!

For simple cases that may be true. CSV files look so simple, so innocent, so sweet. Yet by their very nature they are insidious, vicious, and resemble a bloody walk into the deepest dungeons of classic role-players.

Let us begin our journey.

Innocent simplicity

You take a separator, e.g. the comma, use it to separate your values. Pour both into readable form. Done.

Okay. We need a second separator to show us the next line. But then, done! It's a CSV.

Hmm. There was something. Line separator. Now, is that line feed, carriage return or carriage return and line feed? It depends. For example, what operating system you're running.

The monster is growing

It is not a bad idea to separate values of a list by commas. Especially for Americans, this feels quite natural.

In other parts of the world, the decimal places of fractional numbers are separated by commas. Good, then we'll give the spreadsheets the opportunity to define the separator freely. Problem solved.

Well, not quite. It could be in other contexts that somehow the separator could appear in the individual values of a list. Good, then we'll introduce quoting. We define a character that allows us to recognize whether a separator is a separator or just a text component of a list value. Apostrophes would fit. That was easy, wasn't it?

Short break

So, to sum up. CSV files are easy. You need a separator, which can be a comma or anything else. We have a second separator that separates the lines. Usually there are three variations. We need quoting to see that a value cannot be confused with a separator.

Yeah, it may have been a little more complex than it looked at first. But what is there to make it worse?

Little toothy pegs!

Hmm, what if I want to store a text like this as a value after the raw value 1:

And he said "Oh, no!"

In the text, we have a comma, which would be protected by quoting, But we also have quotation marks, which we need for our quoting. No problem, then we double the quotation mark at that point to indicate that the text is not finished. So in the CSV it looks like this now:

1, "And he said ""Oh, no!""

I got it.

But, wait, what happens if my text consists of a single quotation mark?

1,""""

You're lucky. It seems to be working.

Wait, so what if I have a lot of quotation marks? As in

""""""
This is translated to
1, """"""""""""""

It works, too.

The problem is in the details

Now, a nasty little devil might get the idea to construct a text as value that contains line breaks, for example this one:

Evil Text
",
",

That would then:

1, "Evil text
"","
"",

Oops! If I now stubbornly read this in line by line, I would have read strange lines.
Good thing there is real software out there that reads and parses CSV files cleanly from the beginning. Not that anyone here still uses 'grep' and co.

The Abyss

Have we actually talked about character encoding yet? ASCII, Latin-1, UTF32? UTF8? With or without byte-order mark? No. Let's turn back. We still have a chance.

Later, at the pub.

I admit it was a terrible trip. Now, over a cold beer, we can laugh about it. But our hearts were already in our mouth. We had no idea what to expect.

If only there had been a sign that said what character encoding, what line end encoding, what separators for lines and columns we could expect, yes, then we would have been able to understand CSV and we would have been spared the horror. But the horror comes from the darkness, from the premonitions of the unknown.

Therefore, be warned!

Don't use CSV, it could get you!