The Access database engine has been more or less unchanged since 1997 (enhancements since then can be described on half a sheet of paper). This engine is not very scalable: not more than 5 users can work on a shared Access database at a time without responsiveness becoming a major issue. Also, the database size is limited to 2 Gigabytes, a number which turns out to be surprisingly modest.
However, for all its engine limitations, Access is still the best environment around for creating working prototypes of database applications, because of its great user-interface-building toolset.
The tools that let you “throw the switch” and upsize a database are not as good as they might have been – some of them are quite buggy. Things you have to watch out for
Databases are traditionally concerned with storing data, fetching it and manipulating it for useful purposes; they provide various means of helping to ensure that the data is valid and consistent. High-end databases such as Oracle or Microsoft SQL Server still restrict themselves to this goal. The Access engine is additionally concerned with how the data is presented: thus,
Once you move an Access table to a high-end database, the caption and lookup information is lost, so always keep a backup copy of the original database because you might need it. (Forms previously defined based upon the captions/lookup will, however, continue to work without any problems, as when the form is created, a copy of the table’s original definition is used. For new columns defined directly in the high-end database – this often happens because a new requirement may arise long after the application is already in production use – you will have to do manually what Access would normally do automatically, since linked tables cannot have captions/lookup properties. Another good reason for moving to a high-end as late as you can possibly get away with.
Descriptions, captions, etc. are problematic for tables with numerous columns, e.g., 100+. All of this information is stored as part of the table’s “properties”, and there is a ridiculously low limit (32K) on the total amount of property space. If you hit this limit, ANY modification to the table fails and may even cause the table to get corrupted – of course, Access crashes. (One member of our group lost an hour’s worth of work this way) This glitch is unfortunately not documented anywhere in the Microsoft materials.
The Need to Compact: High-end database engines periodically recycle unused space (or freed-up space) in the database automatically. With Access, you have to periodically “compact and repair” manually, or the database size will keep growing continually- deleting all the records in a large table will not shrink the database’s footprint on disk one bit. (Access 2003 offers the option of compacting a database automatically each time you close it, but this is not the default.)
Don’t use the “OLE object” column type. (This can be used to store information such as Word documents, spreadsheets, images or sound files). Use the hyperlink column type instead: these let you do everything that OLE objects do, and more, at the cost of an extra mouse click. With OLE objects columns, the database size balloons, and you will reach the 2 GB limit very quickly indeed. Further, many image formats, such as JPG, are stored in highly compressed format on disk which are expanded in RAM when loaded and displayed, whereas, when stored as OLE objects in the database, they are stored as uncompressed data- essentially a snapshot of the RAM. As a result, the disk requirements can often be 20-50 times as much.
Hyperlinks are also more versatile, because they can refer to your local machine, a network drive, or any site on the Web.
Hyperlinks are identified by a protocol prefix, followed by a locator. The prefix indicates the type of resource embodied in the hyperlink. Types of prefixes are:
http:// - a regular Web page
https:// - “secure” http- a Web page that is immune to electronic eavesdropping, because its contents are scrambled on the way to and from your machine. Pages that display/accept credit card information, or confidential patient data, use https.
Https uses a pair of 128-bit prime numbers (a “key-pair”) to encrypt/decrypt transmission. Cracking the keys with today’s hardware would take more time than has elapsed since the Big Bang. (The key-pair technology utilizes the Rivest, Shamir and Adelman (RSA) algorithm, which relies on the fact that there is currently no known computational shortcut to factoring a large number that is known to be the product of two primes- the problem of prime-factoring gets exponentially harder with linear increase in key size.)
mailto: (notice there is no double slash at the end). E.g., mailto:john.smith@aol.com. This activates your default mail sending/reading program.
ftp:// - takes you to an ftp site. (There is also a secure ftp, ftps://)
ftp, a protocol that is much older than http, lets you download (and, if the site is set up appropriately) upload files to and from an ftp site. When you employ software such as Internet Information Services (IIS) on your machine, you can set up a default ftp site - essentially a folder with optional sub-folders; all of these contain one or more files.
You may or may not choose to allow anonymous access to the site - if you don't, individual users will need to have accounts on that machine, and log in with a user-name/password. It is generally considered desirable to place a text file named README that briefly describes the purpose and contents of the folders and the files in each - essentially a table of contents.
If you want to support only downloads from your machine, ftp is a no-frills alternative to providing a Web page with links to individual files that can be downloaded, and it is a matter of personal preference whether you choose to use ftp at all. Ftp is more useful when you want to permit selected users to upload files: here, it is advisable to designate a specific folder for this purpose. (This decision also involves extra maintenance, such as the necessity to screen newly uploaded files for viruses.)
news: Takes you to an Internet newsgroup.
Indexes achieve fast search by using a structure called a B-tree (B=Bayer, the computer scientist who devised them). Search is essentially logarithmic in performance, so a table 1000 times as large as another in terms of number of records takes only log2 (1000) or 10 times as long to find a particular value. Indexing technology is what makes databases so much more scalable than spreadsheets.
A Primary key on a table is implemented (in any database engine, not just Access) through a unique key on the primary key column/s.
You consider any column for indexing if you anticipate commonly searching on it or sorting by it. (Examples: Last name of a patient, zip code in a mailing list).
You don’t indiscriminately index every column in a table because, apart from indexes taking up extra space (disk space is cheap these days), the real problem is that with every new record added to a table, or whenever the contents of a column change, the indexes that store these values need to be rebuilt. While B-trees are rebuilt quite quickly, the delays can be significant if you have, say, 20-plus indexes on a table. The circumstances where you need this many indexes on a single table are so rare that, in general, don't hesitate to create an index if you need to search or sort by it even occasionally- the failure to index columns in a table is the commonest cause of poor performance with databases (Access or high-end systems).
Compound Indexes (indexes on more than one column) can be used to ensure that a combination of values in multiple columns is unique – example, the combination of first, last and middle names. You do this by clicking on the “lightning” icon in design mode, specifying an index name, and then, vertically, pulling down, in separate tows, each column that you want as part of the index. You also specify here whether the index is to be unique or not, and also whether any of the columns can be empty (null) or not.
Try to avoid compound indexes if you are not intending to verify uniqueness; use an artificial single-column primary key instead (see autonumbers) and reference it elsewhere. For example, an index on a combination of first name, last name and middle name will take about 20-40 bytes per record, as opposed to an index on SSN, which will take only 9 bytes. The less the space taken up by the value/s to be indexed, the more compact the index becomes: highly compact indexes can be loaded completely by the database engine into RAM/cache to improve lookup and sorting performance. The bulkier the index, the less it is able to benefit from caching.
Warning: Access is not very helpful when the user accidentally enters data that violates EITHER a primary key constraint or another unique constraint. The error message provided is the same, which may occasionally require careful detective work to find out what exactly has been entered wrongly. Adequate documentation or user training (e.g., warning users that the combination of first, middle and last names is supposed to be unique) helps a lot here.