Sunday, October 22, 2017

Numbers Games

A while back, somebody asked me whether it was possible to do a task, but without describing that task very clearly. I said, Yes, probably, let me know. Eventually it turned out that the task was to produce and transmit a list of checks recently written. This seemed entirely possible.

After some back and forth between our Accounting Department and our bank, I learned that the file to be sent required a fixed-length record. The amount of the check was to occupy ten spaces, zero-filled on the left, and with no decimal point between the dollars and the cents. (The absent decimal point I take to be a COBOL idiom. It must have made sense in the dawn of data processing: if the average transaction was in the hundreds of dollars, omitting the decimal point gave one roughly a 16% savings in space; with the expensive storage of the day, that was considerable.)

Anyway, I generated sample files, which looked OK. I generated and sent a file of checks not yet cleared. Then I looked closely at a smaller file. It was pretty good, but a transaction of 5218.94 showed up as 0000521893. An examination at the file of checks not cleared showed that one out of forty was a cent low. Now, I am not one who will quibble over one cent in a five thousand dollar transaction. But accountants will, and I am glad that they will: I want the people who watch the money to be serious and precise. And I would not care to explain to some we paid why his check was held up by the bank for a discrepancy of one cent.

Such discrepancies arise because modern computers do their calculations with binary arithmetic, and powers of 2 map only so well on to powers of 10. Eventually, I was able to come up with a minimal demonstration of the problem. Running a Perl script with the text

#!/usr/bin/perl -w-

use strict;

my $original = '5218.94';
my $x100 = $original * 100;
my $x100_sprintfd = sprintf '%010d', $original * 100;
my $sprintfd_no_x100 = sprintf '%010d', 521894;

print <<EOF;
Original -> $original
Times 100 -> $x100
Times 100, sprintfd -> $x100_sprintfd
Sprintfd, not Times 100 -> $sprintfd_no_x100
EOF


Yields
Original -> 5218.94
Times 100 -> 521894
Times 100, sprintfd -> 0000521893
Sprintfd, not Times 100 -> 0000521894
This happens with Perl 5.10, 5.16,  and 5.22, I find. Oddly enough, the format specifier '010.0f' does not round down. No doubt the mechanics of it are there to find in the source code, which is available on the internet. But I don't know that I could track this down, and I do know that I haven't time to try. The workaround, which I arrived at before coming up with this script, is to do the multiplication within the database query, rather than outside.

William Kahan put a lot of thought into the many edge cases of floating-point arithmetic, and certainly deserved his Turing Award. After Wednesday afternoon, I better understand why he called his test suite for floating-point implementations "paranoia".

No comments:

Post a Comment