Friday, October 13, 2017

Bitten by cdrkit bug

At work, I recently ran into a problem that I eventually tracked down to a bug in cdrkit.  This glosses over a few steps, but should capture the essentials.

A computer that, among other things, is used for burning DVD's recently made the move from RHEL (Red Hat Enterprise Linux) version 5 to version 7.  We realized shortly there-after that it was failing to burn DVD's successfully.  After some testing, we found that the same problem occurred on another box with the exact same model of DVD drive.

After confirming that the model of DVD drive would still write using RHEL5, I took the cdrecord binary from RHEL5, ran it under RHEL7, and it worked.  Clearly the difference was between cdrecord (from cdrtools, used on RHEL5) and wodim (from cdrkit, which is aliased to cdrecord on RHEL7).

I ran both through strace, and found that for fixating (finalizing) the disc, cdrecord was using a timeout of 1000 seconds, while wodim was using 200 seconds.  On the drive in question, fixating the drive takes about 240 seconds.  So with wodim, the timeout is hit, some diagnostic codes are dumped, and wodim reports that the fixate process failed.

Digging into source code, I found that one of Red Hat's patches for cdrtools was injecting a function (fixate_mdvd) that extends the timeout to 1000 seconds, calls the original cdrecord fixate function (fixate_mmc), and restores the 200-second timeout.

That Red Hat patch became part of the cdrkit fork, but additional changes 1 made in 1.1.6 interfere with it.  In cdrtools, the time-consuming part of the operation is happening inside of scsi_flush_cache, which is called by fixate_mmc.  The change in 1.1.6 makes an additional call to scsi_flush_cache from fixate_mdvd, before it extends the timeout.  This essentially negates the effect of the timeout change for Disk-At-Once sessions.  This 1.1.6 change appears to have been made to solve a Debian bug 2, but I don't see the evidence that it helped anything. Even if it did appear to help, it may have just been hiding the real cause.

I believe the 1.1.6 change to fixate_mdvd should be reverted to fix the present bug.  If the bug in the Debian report is encountered again, more rigor needs to go into determining the root cause.


1 It's hard to find code history for cdrkit, but the change can be seen here: http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/gutsy/cdrkit/gutsy/revision/8/wodim/drv_mmc.c
2 Debian Bug #411362: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=411362

No comments:

Post a Comment