The DOS 3.3 SYS.COM Bug Hunt!

SYS.COM corrupted a NetDrive image. But why?

Posted: 2025-02-22
Tags: DOS, NetDrive, ForgotToCheckReturnCode


2025-02-24: Somebody over at Hacker News pointed me at the source code, and it confirms the bug and also how things like ambiguous media descriptor bytes are handled. Hat tip to mmastrac and I'll post an update after I digest everything.


In ye olden days to make a diskette bootable you had to format it using the /s option of the FORMAT command. That works fine for blank disks, but software vendors had a small problem - they would sell you a disk with their software on it but they couldn't include the DOS files needed to make it bootable because they were not selling you DOS. To get around this they would leave space available on the disk and have you use the SYS command, which copied the magic boot loader code onto from your DOS disk onto their disk. There were some restrictions on where the free space was located and how much was required, but it generally worked to allow you to make a diskette bootable without having to format it.

Last year somebody reported a problem with the DOS 3.3 SYS.COM command when used with NetDrive. They started with a valid FAT12 image, ran SYS.COM to make it bootable, and then they were not able to mount the image using NetDrive again. Running SYS.COM against the image had broken something.

Besides copying the operating system's hidden files to the target drive letter, SYS.COM also copies some boot code into the first sector of the disk. In general it does not make sense to run it against a NetDrive image because you already had to boot DOS to mount the image, but it should not hurt anything. So I decided to have a look at what was going on.

The first step was to recreate the problem. I created a 10MB FAT12 disk image using NetDrive. The first few bytes of the first sector (the volume boot record) are shown below:

The FAT12 NetDrive image when first created:

00000000   EB 3C 90 4E  45 54 44 52  49 56 45 00  02 08 01 00  .<.NETDRIVE.....
00000010   02 00 02 00  50 F8 60 00  00 00 00 00  00 00 00 00  ....P.`.........
00000020   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................

<... snip ...>

000001E0   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
000001F0   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................

00000200   F8 FF FF 00  00 00 00 00  00 00 00 00  00 00 00 00  ................

Dissecting that we find:

Offset Bytes Description
0x00 EB 3C 90 Jump to executable code
0x03 4E 45 54 44 52 49 56 45 OEM ID ("NETDRIVE")
0x0B 00 02 Bytes per sector (512)
0x0D 08 Sectors per cluster (8)
0x0E 01 00 Reserved sectors (1)
0x10 02 Number of File Allocation Tables (2)
0x11 00 02 Root directory entries (512)
0x13 00 50 Sectors (20480)
0x15 F8 Media Descriptor (hard drive)
0x16 60 00 Sectors per fat (6)

That is a minimal BIOS Parameter Block (BPB) as defined by DOS 2.0 but also recognizable to later versions of DOS. Later versions of DOS have extended it a few times.

At offset 0x200 you see the start of the first File Allocation Table (FAT). The first byte 0xF8 is the media descriptor byte, which should be the same as the one in the BPB. This is FAT12 so entries are 12 bits in size; the first entry is actually 0xF8F and the second entry is 0xFFF. Ignoring the media descriptor entry and the second entry which is reserved, this FAT is completely empty.

I mounted the new image under IBM PC DOS 3.3 and ran SYS.COM against it. That looked normal. However, when I disconnected the image and tried to mount it again NetDrive complained that it was a bad image:

Well, the BPB started off correctly but now it seems bad. Let's look at the first sector now that SYS.COM has altered it:

00000000   EB 34 90 49  42 4D 20 20  33 2E 33 00  02 3B C1 75  .4.IBM  3.3..;.u
00000010   1A 8B 16 DC  09 8B 0E DE  09 2B CA 74  D4 8B 1E 00  .........+.t....
00000020   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 12  ................
00000030   00 00 00 00  01 00 FA 33  C0 8E D0 BC  00 7C 16 07  .......3.....|..
00000040   BB 78 00 36  C5 37 1E 56  16 53 BF 2B  7C B9 0B 00  .x.6.7.V.S.+|...

<... snip ...>

000001C0   0D 0A 44 69  73 6B 20 42  6F 6F 74 20  66 61 69 6C  ..Disk Boot fail
000001D0   75 72 65 0D  0A 00 49 42  4D 42 49 4F  20 20 43 4F  ure...IBMBIO  CO
000001E0   4D 49 42 4D  44 4F 53 20  20 43 4F 4D  00 00 00 00  MIBMDOS  COM....
000001F0   00 00 00 00  00 00 00 00  00 00 00 00  00 00 55 AA  ..............U.

00000200   F8 FF FF 03  40 00 05 60  00 07 F0 FF  09 A0 00 0B  ....@..`........
00000210   C0 00 0D E0  00 0F F0 FF  00 00 00 00  00 00 00 00  ................

Offset Bytes Description
0x00 EB 34 90 Jump to executable code
0x03 49 42 4d 20 20 33 2E 33 OEM ID ("IBM 3.3")
0x0B 00 02 Bytes per sector (512)
0x0D 3B Sectors per cluster (59)
0x0E C1 75 Reserved sectors (49525)
0x10 1A Number of File Allocation Tables (26)
0x11 8B 16 Root directory entries (35606)
0x13 DC 09 Sectors (56239)
0x15 8B Media Descriptor (unknown)
0x16 0E DE Sectors per fat (3806)

It makes sense for the jump instruction and OEM ID to change. And the bytes per sector field is correct. But the rest of the BPB is garbage. Something corrupted it.

Looking at the rest of the sector the boot code starts at offset 0x3E and that looks reasonable. There is also the bootable partition signature (0xAA55) at offset 0x1FE, and the FAT shows some additional entries for the two hidden files that were copied over.

I tried it again, this time with a diskette image mounted using NetDrive, and it did everything perfectly. Which implies that the problem is not in NetDrive, but in the difference between hard drive images and floppy disk images.

So the DOS 3.3 SYS command added the boot code and updated the FAT correctly, but it clobbered the BPB. But only on the NetDrive hard drive image. Why?

DOS 3.2 added a function called "Generic IOCTL" which allows DOS to query a device to get its geometry, write a track, read a track, format a track, etc It also added code to handle these additional calls for the devices supported by the BIOS. For example, here is a call to "Get Device Parameters" (Generic IOCTL, sub function 0x60) for drive C:

Note that at the first breakpoint (after the IOCTL call) the Carry Flag (NC) is not set. This means the call was successful and the data returned is reliable.

The documentation says that the DEVICEPARAMS data structure has a BPB starting at offset 0x06, which here shows:

Offset Bytes Description
0x06 00 02 Bytes per sector (512)
0x08 04 Sectors per cluster (8)
0x09 01 00 Reserved sectors (1)
0x0B 02 Number of File Allocation Tables (2)
0x0C 00 02 Root directory entries (512)
0x0E B1 FF Sectors (65457)
0x10 F8 Media Descriptor (hard drive)
0x11 40 00 Sectors per fat (4)

That makes sense for a 32MB C: drive.

Let's run that code again against the NetDrive drive image, with the boot sector returned to what it was before it was corrupted:

I changed one instruction to change to the NetDrive drive number and ran the code again, but this time at the first breakpoint the Carry Flag (CY) is set. This means there was an error, and AX holds the error code. Value 0x0001 means "ERROR_INVALID_FUNCTION" which makes sense because the NetDrive device driver doesn't support this function. (It is not required to be supported.)

If we dig around inside of SYS.COM we can see a call to Generic IOCTL:

Here we see it getting the drive number from storage, setting AX to 0x440D, setting CH to 0x08 (a block device) and CL to 0x60 (get device parameters). DS:DX will be the pointer to the parameter block to fill in.

Let's run the code!

At the breakpoint after the Generic IOCTL we see that the Carry Flag (CY) is set and the error code is set to ERROR_INVALID_FUNCTION, just as it was above. And here is the bug ... nothing is checking the Carry Flag to see if there was an error after the Generic IOCTL call.

The Generic IOCTL writes DEVICEPARAMS structure at DS:DX, assuming the call did not fail. If the call fails, as it does here, we'll just see whatever was already in that storage. As before, the BPB structure will be at offset 0x06. Here is what we got back in that data structure:

(Note that the segment registers changed causing the instruction pointer to shift ... we are still in the same code though, just using aliased memory locations. Thanks segmented x86!)

IOCTL BPB:      72 E4 3B C1 75 1A 8B 16 DC 09 8B 0E DE 09 2B CA 74 D4 8B 1E 19 09 B4 40 CD
Corrupted BPB:  00 02 3B C1 75 1A 8B 16 DC 09 8B 0E DE 09 2B CA 74 D4 8B 1E 00 00 00 00 00

Except for the first two bytes (the sector size) and the last four bytes (dpHugeSectors) that lines up perfectly. So the failure to check the Carry Flag wound up causing bad BPB data to be written to the volume boot record.

So what are those bytes? It looks like code to me, and we can confirm that by just disassembling it:

Great, where did it come from? Using the search feature of DEBUG.COM we can find those bytes, and they appear right before the suspect code:

I am a little bit freaked out by that because the pointer to the buffer is set before the IOCTL call; the code knowingly sets a pointer to a buffer into what looks like its code area. Let's hope they knew they were done with that part of the code, or it's just another interesting bug to dissect.

So SYS.COM clearly doesn't work on hard drive images mounted with NetDrive, but it did work on a floppy image. What is the difference and why did it work?

The answer requires us to look inside of the BPB again. The BPB has a field called the "media descriptor byte" which is used to describe the layout of the image. This single byte has a limited range of valid values:

Value Description
F0 3.5 inch, 2 sides, 18 sectors per track, 80 tracks, 1440KB or
3.5 inch, 2 sides, 36 sectors per track, 80 tracks, 2880KB or
5.25 inch, 2 sides, 15 sectors per track, 80 tracks, 1.2MB
F8 Hard disk, any geometry
F9 3.5 inch, 2 sides, 9 sectors per track, 80 tracks, 720KB or
5.25 inch, 2 sides, 15 sectors per track, 80 tracks, 1220KB
FA 5.25 inch, 1 side, 8 sectors per track, 40 tracks, 160KB
FB 3.5 inch, 2 sides, 8 sectors per track, 80 tracks, 640KB
FC 5.25 inch, 1 side, 9 sectors per track, 40 tracks, 180KB
FD 5.25 inch, 2 sides, 9 sectors per track, 80 tracks, 360KB or
8 inch, 2 sides, single density, 500KB
FE 5.25 inch, 1 side, 8 sectors per track, 40 tracks, 160KB or
8 inch, 1 side, single density, 250KB or
8 inch, 2 sides, double density, 1220KB or
FF 5.25 inch, 2 sides, 8 sectors per track, 40 tracks, 320KB.

You can tell this wasn't well thought out. The media descriptor byte is often not enough to tell you what you are working with; you need to combine it with knowledge of the physical drive type too.

When I run SYS.COM against a floppy image mounted using netdrive the breakpoint after the Generic IOCTL call does not even get hit:

I am pretty certain that it used the media descriptor byte from the BPB and did not bother making the Generic IOCTL call. I used a 360KB disk image with a media descriptor byte of FD, which is very common. And no IBM PC ever shipped from the factory with an 8 inch drive so it is not ambiguous. So as an experiment I used a 2880KB disk image which has a media descriptor byte of F0, which is also shared with 1440KB diskettes. Sure enough SYS.COM tried to make the Generic IOCTL call on that image, failed, and corrupted that BPB.

So in short:

The bug was probably introduced in DOS 3.2. I'm pretty sure that it is still present in DOS 4.0, as the code is still not checking the Carry Flag after the Generic IOCTL call.


Created February 22nd, 2025
(C)opyright Michael Brutman, mbbrutman at gmail dot com