Nonstandard Database Architectures

Not all modern databases have a relational, object or object-relational design. There are several reasons why this is so:

Special Methods for Fast Data Retrieval: Hashing

In most cases, the data which is accessed is read-only; that is, it does not need to be (or it cannot be) updated. (If the data is updated at all, such changes are made infrequently, in batch mode.) In such a case, one can use data structures that allow extremely fast retrieval.

One such retrieval method is called hashing. In hashing, the key value that is to be searched for is transformed algorithmically into a number, called the hash value. The data is organized such that the hash value happens to be the record number where the key value is stored. (In practice, more than one key may be hashed to the same hash value, a phenomenon called hash collision. Collisions are resolved by a slight modification: rather than reserve spaces for records, we actually reserve space for slots (buckets), which record a list of keys which hash to a common value. (With a good hashing scheme, the list should not be more than two or three keys long.)

For large data sets, hashing is faster than searching on an index, because the data is retrieved in much fewer disk accesses. (In most cases, only a single disk access is required.)

One common recipe for hashing goes:

Hashing as a means of data organization and retrieval is also available in some relational databases, notably Oracle. Hashing is fully compatible with standard indexing methods; depending on our purpose, we can use either hashed access or indexed access. Note that here can only be one hash key for a given table (because there can only be one physical order of records), but there can be any number of indexes.

Examples of databases that use hashing are the built-in (embedded) databases that accompany most word processors: dictionaries, spelling checkers and thesauri. Databases used for identity verification (e.g., based on magnetic stripe readers that are used to automatically unlock doors) also use hashing, because the response time must be very fast.

Architectures that allow Fast Boolean Comparisons: Bitmapped Architectures

The only well-known example of such a system is QMR, for Quick Medical Reference, an knowledge-based system to assist diagnosis in the domain of internal medicine. (Computer scientists prefer nowadays to use the term "knowledge-based system" instead of the earlier, somewhat more pretentious term "expert system", because such a system is used typically to assist in diagnosis, when employed by reasonably knowledgeable users, rather than pretending to act as a complete replacement for a human expert.) QMR was built by Fred Masarie and Randolph Miller at the University of Pittsburgh in the early '80s.

A large part of QMR's internal database consists of a two-dimensional Boolean matrix. One axis of the matrix is the list of disease conditions about which QMR has knowledge. The other axis is the list of findings in the history, physical examination, or lab values/investigations about which QMR knows. Each cell of the matrix is TRUE if a given finding is associated with a given disease, or FALSE if not. (For laboratory values to be treated as Booleans, they must be expressed relative to normal values rather than as absolute values. Thus, this part of QMR only cares whether the serum sodium is high, low or normal - it is not concerned with the actual values.)

The nice thing about TRUE and FALSE is that they can be represented as the bits 1 and 0 respectively. Since there are 8 bits in a byte, it means that 8 Boolean facts can be stored in the space required to store a single character. (In fact, if the majority of the data consists of zeros, one can use very simple data compression techniques to achieve much higher information densities.) Another feature of Booleans is that they can be combined using logical operators. Logical operations on bits are primitives that are directly supported by computer hardware, and can therefore run very fast.

The only drawback associated with bitmapped databases is the maintenance overhead. Bitmaps are computer-understandable, not human-understandable. Therefore, the knowledge of diseases and findings must be maintained in another form, and converted (by program, of course) into bitmapped form with each new update of the database.

Mere presence of a particular finding in a particular disease is not useful enough for diagnosis. QMR also maintains a knowledge base where the positive findings associated with a given disease are given a weight, from 1-5 depending on how frequently they are seen in that disease. Likewise, there is an inverse matrix that records, again on a 1-5 scale, how likely a particular disease is, given the presence of a finding. (The necessary for two matrixes can be explained with the following example: 100% of obstructed labor occur in women - sex = female - but only a very, very small proportion of women have obstructed labor).

QMR was the next generation of a earlier system called INTERNIST (subsequently renamed CADUCEUS, because the name INTERNIST turned out to have been a previously registered trademark)  designed in the 70s at Pittsburgh by Harry Pople and Jack Myers. Dr Myers served as mentor and alpha-tester for QMR. It should be emphasized that QMR was designed to provide fast responses on ordinary IBM PCs (it was originally written in Turbo Pascal) at a time when CPU power, memory and disk space were puny by today's standards. (Then, a fancy PC ran at 8 MHz, and had 640 KB memory and 10 MB disk space.) Still, QMR was able to provide responses in seconds that minicomputers running CADUCEUS took minutes to provide.

With microcomputers getting ever more powerful, it is doubtful whether the overhead of transforming raw data into bitmapped form each time the knowledge base altered would be justified in today's environment.

Scaled-Down Systems

In some cases, the full functionality of a relational database is not necessary. Only a subset is adequate, and in such a case, when the database must be widely distributed or otherwise made publicly accessible, it may have made sense to use a few of the basic building blocks that are used for relational database engines, and build your own mini-database engine from scratch. (For example, there are several commercial packages that allow creation of B-tree-based indexes and indexed retrieval of data through code written in C or Basic. Such packages allow royalty-free runtime distribution.)

The second argument, however, is becoming increasingly less valid, as the cost of powerful, full-functioned databases on microcomputers (such as MS-Access) has plummeted. However, MS-Access is available only on Windows platforms. If you are using a UNIX machine for your database server, programs like Oracle, Informix and  Sybase can make a pretty big hole in your bank account. In such case,  a scaled-down solution might make sense. However, over the last few years, low-cost engines such as the shareware mSQL have been developed for UNIX. While not offering the full functionality of its pricier brethren, mSQL is pretty adequate for small or medium-sized applications.