Databases and the World Wide Web
Tim Berners-Lee created the infrastructure for the Web at CERN (the center for European Nuclear Research) in 1989. Berners-Lee was essentially looking for a clean and simple way to publish documents over the Internet that used a hypertext metaphor (where clicking on to a link on one document’s page would open up another document, which was possibly stored at a different site). He devised the basic Hypertext Transfer Protocol (HTTP), as well as the language for marking up the documents (Hypertext Markup Language, or HTML). HTML was based on a much more complex language termed SGML (Standard Generalized Markup Language). The World Wide Web was the term coined to describe the collection of HTML documents on machines everywhere.
Berners-Lee’s original work was strictly text-oriented, and HTML was designed to work even with unsophisticated terminals. Later research at the National Center for Supercomputing Applications (at the University of Illinois, Urbana-Champaign) led to the creation of the Mosaic browser, which could handle graphics as well. It was Mosaic that was really responsible for the WWW taking off. Mosaic’s lead developer and primary evangelist, Mark Andreesen, now well known as the co-founder of Netscape, was then a graduate student at UIUC. While Mosaic was freeware, it was handed over to a company called Spyglass for commercialization, over Andreesen’s objections. Andreesen and his team were soon lured out of UIUC to form Netscape. (Microsoft, a year later, licensed the Spyglass product, eventually enhancing it and releasing it as Internet Explorer.)
The impact of the WWW on society has been regarded as even greater than the impact of the computer itself when the latter first appeared. It has made computer access mainstream, changed lifestyles (you have on-line addicts) it has changed the way people look for data (the Web is now the first resource used for searching) and of course, it has spawned several industries. HTML itself has gone several revisions (the current version is 4.0) to allow rendering of more complex documents in an economical way.
The browser that you use is called a Web client. When you connect to a particular site by typing in a URL (Unique Resource Locator), your web client is really making a request to the Web Server at that site for a document ("page") whose name is specified by the URL. (When you connect to a site without specifying a particular document, a "default" Web page is displayed.) Using hyperlinks on the page (hyperlinks are URLs embedded within a page), you can navigate to other pages on that site, or to pages on other sites.
The beauty of the Web, from the point of view of both the Web site designer and the user, is that a tremendous amount of work is being done behind the scenes – all that needs to be done is to "author" the page correctly. While Web page design was once regarded as a task complex enough to require programming expertise, tools such as Microsoft Frontpage are now intuitive enough to allow ten-year-olds to design Web pages.
The page that is sent to a browser does not need to exist on the server before the browser’s request was made. The major difference between the Web documents of today and those of six years ago is that today, the vast majority of pages are dynamic (i.e., created on demand) by programs called by the Web server. Some pages contain portions that are partly static (such as the company’s logo) and others that are dynamic (such as "hit counters" or price information on a particular item). In many cases, the programs that generate the page must tap into a database where the information displayed on the page is really stored. Web database programming is a much more complicated task than creating static pages (well beyond the capacity of most ten-year-olds).
The architecture of a Web database server is called three-tier. The three tiers are browser, Web server, and database server. (The Web server, and the programs which support it, mediate between the browser and the database server. Sometimes, the same machine may house both the database and the Web server, but this is not always so.) There is sometimes a fourth tier, a transaction/application server, which typically lies between the Web server and the database. By contrast, the conventional client-server architecture, which preceded the Web by at least a decade, is called two-tier.
One of the problems with Web database development is that, compared with the tools for traditional client-server development (which are quite mature), the tools for Web database applications are relatively raw and immature. Certainly, the language environments for development (e.g., Javascript, VBScript) leave a lot to be desired, both for code writing as well as for code testing and debugging. Very obvious errors such as mis-spelled variable names, which in most other languages would be caught at edit time or compile time, are often detected only at run-time. This means that the time required for testing and debugging is enormously lengthened.
Statelessness And How to Work Around It
The HTTP protocol is essentially stateless. This means that when a browser makes a request of the server for a page, the server sends the page and promptly forgets that the client ever existed. This simplicity is both a boon and a curse. It causes problems when a client is interacting with a particular Web site across multiple pages, and some kind of history ("state") must be preserved, so that the server can remember what this user did several pages ago. For example, if you are buying books or music CDs on-line, you will typically search by title, author, subject etc., and add books to a "shopping cart". In this process, you may navigate dozens of Web forms, and obviously there should be a means of keeping track of the items you have selected. (A Web form is a Web page where the user enters information either through the keyboard or with the mouse.)
There are several ways to maintain state.
Obviously, when the Web is being used for electronic commerce, both seller and buyer want some insurance against being swindled. There are three possible scenarios.
The first scenario is dealt with by using a third party for authentication (such as Visa) during the transaction, which verifies the correspondence of the name and card number, reports information about stolen cards, and so on.
The second scenario is partly handled through third-party authentication using a certificate authority such as Verisign. Verisign makes no guarantee that the seller is ethical (the buyer can contact her credit card company if there are problems), but it does prevent spoofing by ensuring that the site the buyer has logged on to is who it claims to be.
The snooper scenario is handled by encrypting all critical communication between seller and buyer. (This is like turning a "scrambler" on.) Practically all Web servers, as well as the two major Web browsers, have encryption built in. By default, Web browsers perform encryption using a 40-bit key (the more the number of bits in the key, the harder it is to crack), but, if you are within the US or Canada, you can download an add-in which will perform 128-bit encryption. (This add-in is considered military-grade technology, so sites outside the US cannot download it.)
When Web applications are used to provide user interfaces to patient data, encryption becomes critical. The protocol that is used for encrypted communication is called HTTPS, or secure HTTP. (This was originally a Netscape specification, but it was made publicly available, so all browsers use it now.) From the developer’s point of view, enabling encryption is as simple as changing the URL prefix from http:// to https://. When such a prefix is encountered, encryption automatically kicks in.
The late nineties were the times of the "browser wars", when Microsoft and Netscape added proprietary features to their own versions of HTML. This greatly increased the workload for Web authors who wanted to make their pages run on all browsers. In response, the World-Wide-Web consortium (www.w3c.org), the body that determines Web-related standards, created version 4.0 of HTML, which incorporated the best features of each flavor. However, they realized that standards bodies could not keep up with the pace of Web-related software advances, and so they decreed that there would be no more changes to the HTML standard after version 4.0. Instead, changes and enhancements in functionality would be added through XML (Extended Markup Language).