Monday, May 11, 2015

Noise and Signal

We requested some data from another organization. Eventually a CD arrived that was said to have the data we wanted. We found a T-SQL script, T-SQL being the dialect supported by Microsoft SQL Server. The file was large, about 55 megabytes. However, it appeared to have only the data definition language (DDL) to create the database.

The other organization said otherwise: the data was in there. I doubted this. The script had no INSERT statements, for one thing. For another, the script had a suggested database size around 500 gigabytes, four orders of magnitude larger than the script. But  I went ahead and ran the script. It created more than 5000 tables, all empty. I pulled a list of them to be sent back to the other organization

The next day I had another look.The name of the database sounded like that of a well-known Enterprise Resource Planning (ERP) system and some of the tables seemed, according to Google, to belong to that system. That satisfactorily explained the number of tables. A look at a printout we received with the disk provided more information, enough to make some educated guesses. It appears to me that

  1. The organization found a contractor to build the new system..
  2. The contractor's data analyst looked about for a system that might suggest what he should do.
  3. His eye fell on this ERP system, which had tables matching a component or two of the requirements. (It has tables for almost any data a company might need to record, from payroll to cash-register transactions to customer complaints and loading dock traffic.)
  4. He selected and perhaps adapted about twenty-five tables and created about twenty-five more.
  5. He or the database administrator created such intermediate tables as were needed to load up and modify the data from the previous system.
  6. At no point did anyone drop any of the load tables,which usually have multiple versions, with a suffix indicating creator or date. In fact,
  7. At no point did anyone drop the 5000+ tables set up for the use of loading dock, kitchen maintenance, payroll, etc., etc., and of no use to the new system. The noise to signal ratio therefore is about 110 to 1.
  8. Somebody in management told yet another employee to extract the whole database.
  9. That employee found instructions for dumping the DDL and followed them.
This sort of thing happens. Programmers have very good reasons for reusing code, though reusing 1% and dragging the other 99% along seems careless. But it would be good if one could find that one person who understands the system at all, or at least multiple, levels and could provide what we asked for.

No comments:

Post a Comment