The SELECT statement is the most heavily used as well as the most complex in SQL, mainly because of all the options that you have at your disposal. (It's complex enough that I'll only try to emphasize the wrinkles. You may need to peruse the scattered help in Access 2003: the starting point is typing in the keywords "SELECT statement" into the Help system, and selecting the topic with the indicator "Help > Data Manipulation Language". Unfortunately, this choice is not on the top of the returned list.)
SELECT
[DISTINCT | top N]
Expression-list
FROM
Table-list
WHERE
Condition-list (relational-operators, logical operators,
Functions (LIKE, IN)
GROUP BY…. HAVING
ORDER BY expression-list [DESC]
Essentially, you fetch something from a list of tables, optionally limiting what you choose to fetch based on conditions; you can aggregate the results (filtering based on the aggregate), and sort (order) the results appropriately.
Removes duplicate rows from the results. This is especially useful when you are returning only a single column that may have duplicate values in the source data, and you wish to see only unique values.
This lets you see, for example, the top 10 values. It makes sense only if you are sorting the results; in such a case, after sorting, anything other than the top 10 would not be displayed.
This is typically a comma-separated list of one or more columns, indicating a compound sort with the leftmost column in the list being sorted first. Putting the word DESC after a particular column name indicates a descending sort for it; the default for a column is ascending sort.
Important: In a relational database, records will rarely be stored in the order in which you entered (or imported) them. This is because, as records are deleted over time, "holes" in the database are created and recycled; newly added records will be placed in those holes rather than being added at the end of a table. Therefore you cannot rely on the entry or import order. Output must almost always be sorted if it is voluminous enough, to facilitate locating a particular value.
This is typically a comma-separated list of columns (optionally followed by aliases): if you want all columns from a particular table, you use the expression tablename.* (the asterisk is short for "everything"). However, the list can also contain any combination of columns and:
You can use the operators +, -, * (multiply), / (divide), ^ (exponentiation), & (string concatenation) to combine columns and/or constants. Operators also vary by vendor; Oracle uses ** for exponentiation and || for concatenation.
Different vendors are also varyingly fussy : for example, in Access and Oracle, you can concatenate a string and a number (the latter will automatically be converted to a string), but SQL Server will complain unless you first convert the number to a string using a conversion function (see later).
The list of functions varies so greatly by vendor that you need to refer to the documentation. Access is quite powerful in that you can use any function defined in the Visual Basic for Applications language, in addition to calling any VBA function that you have written. The function set in Access is actually richer than that in high-end systems, and the list is so large that function are grouped by category.
In the previous section, we've stated that a SELECT query can be used wherever a table can be used. This "divide and conquer" strategy is useful when you're composing highly complex queries. In this subsection, we won't distinguish between tables and queries.
When you use more than one table in the table-list, you are combining ("joining") them. The word "join" refers to the linking of tables, as in the relationships diagram.
In general, if you have N tables in a query, you have at least N-1 links.
In the discussion below, we treat queries
The traditional link is called the INNER JOIN.
Consider an inner join between two tables. Here, a value in one table's column is looked up in the linked column of the other table. (Generally, you link a foreign key in one table to the primary key in another, so that there are no more rows in the result than there are foreign key values.) If a particular primary key value does not exist in the foreign-key table (or vice versa), then values from that row will not appear in the result. An inner join is symmetrical, that is, for two tables A and B, the data returned by A INNER JOIN B is the same as the result of B INNER JOIN A.
An outer join between two tables A and B resembles an inner join, except that you specify that all values from either A, or B, or both, should appear even if there is no match in the other table. That is, if you want all values from table A then, for unmatched values from A, you will see a row where the values originating from table B are empty (null).
A LEFT JOIN B means "give me everything from the table to the left of the word 'JOIN' (that is, A), and whatever matches from B".
A RIGHT JOIN B means "from the table to the left of the word 'JOIN' (B)".
It follows that
A LEFT JOIN B
will return the same results as
B RIGHT JOIN A
Also, an outer join is asymmetrical; that is, the result of A LEFT JOIN B is not the same as the result of B left join A. Outer joins
Outer joins are particularly useful for finding missing values.Let's say that we have two tables, Patients and Visits, linked by a Patient ID column in each table. To find patients without visits, we would use a query like:
SELECT P.Patient_Name from Patients as P LEFT JOIN Visits as V ON P.Patient_ID=V.Patient_ID and V.Visit_Date is null
That is, when we do a left outer join, all columns from the Visits table will be null if the patient has not had any visits, and we test any of these columns (e.g., visit date) against the null value.
Finding missing values can also be important during data cleansing. You may have inherited data maintained in a non-relational database, and when you are merging them, you may find a common type of error: a foreign key table has values that are not present in the primary key table, because referential integrity was not enforced. Access lets you use the Query Wizard to compose a Find Unmatched query: underneath the hood, it composes a query of the form:
select [column-list] from
ForeignKeyTable LEFT JOIN PrimaryKeyTable
ON ForeignKeyTable.ForeignKey= PrimaryKeyTable.PrimaryKey
WHERE PrimaryKeyTable.PrimaryKey is null
If you have two or more tables in the query that are NOT linked to other tables, you have a CARTESIAN Join. If there are N1 rows in one table and N2 rows in another table, a Cartesian join will return N1 X N2 rows. Similarly, a Cartesian join of three tables returns N1 X N2 X N3 rows.
95 times out of 100, a Cartesian join is a mistake that you made. However, used judiciously, it can be used to solve certain problems, such as generating all possible pairs or triplets without writing program code.
More usefully, you can do a Cartesian join of a table with itself. For example, suppose a table T contains a column C, and has 4 rows, with the values, A, B, C and D. The query
SELECT T1.C as C1, T2.C as C2 from T as T1, T as T2
(16 rows)
generates 16 possible combinations, AA, AB, AC, AD, BA etc. If, however, you want to limit the output so that the same value doesn't occur twice, the modified query below will do this:
SELECT T1.C as C1, T2.C as C2 from T as T1, T as T2 where T1.C <> T2.C
(12 rows)
If you want to generate unique combinations of values (so that, for example, you don't want both AB as well as BA to appear in the output), the following query will ensure that the results will appear in order:
SELECT T1.C as C1, T2.C as C2 from T as T1, T as T2 where T1.C < T2.C
(6 rows)
This will output the row AB (A is alphabetically less than B) but not BA.
The second and third approaches are useful for drug databases. Given a list of drugs that a patient is taking, you can generate all possible drug pairs, and join this result to an interactions table to determine whether any of these drugs interact with any other in the list.
You can use a SELECT query inside another select query. That is, you could use something like the following:
SELECT P.First_Name from (SELECT First_Name, Last_Name from Patients) as P
The above example is pretty terrible, since it does not do anything that the simpler query
SELECT First_Name from Patients
does not. Sometimes, however, the ability to nest one query inside another is useful when you have an Access database with hundreds of queries, and you discover that certain queries are used only once, inside other queries. In such a case, you can copy the SQL of a once-used query and paste it inside the other query.
For example, suppose you have a query, Query2, which uses another query, Query1 (such as one of the Cartesian-join examples above).
Instead of specifying
The Where clause limits the output based on criteria that you specify. For example, suppose you want a list of all female Patients who live in Connecticut and whose date of birth is greater than 1/1/1960. The following query would do it:
SELECT * from Patients where Sex='F' and State='CT'
and DateOfBirth < #1/1/1960#;
The operators = and < in the above example are relational (comparison) operators. Other relational operators in SQL are >, <> (not equals), >=, <=.
For string comparisons , you can also use wildcards. In Access, the wildcard operators are * (which matches any number of characters) and [], which matches a set or range of characters. Thus, to look for all patients whose last name begins either with SMITH or SMYTH, we would use
SELECT * from patients where LastName like 'SM[IY]TH*"
while if we wanted all patients whose last names began with the letters A through F, we would use
SELECT * from patients where LastName like '[A-F]*"
The word "LIKE" introduces the wildcard operator
The Boolean (Logical) operators are NOT, AND, and OR. If there are two
conditions,
1. Sex='F'
2. State='CT'
the condition
Sex='F' AND State='CT'
retrieves patients who are both female and live in CT, while the condition
Sex='F' OR State='CT'
retrieves patients who are either female or live in Connecticut.
NOT is typically used to reverse the truth value of a compound condition. Thus
NOT (Sex='F' AND State='CT' )
would retrieve all patients except females in Connecticut.
You rarely need to use NOT by itself, because the <> operator is available.
Thus, to retrieve patients from states other than Connecticut,
NOT (State = 'CT')
is the same as
State <> 'CT' .
Access is very useful for building applications that someone else can use, with a relatively modest learning curve. Those other persons may not be computer-savvy, and you definitely don't want to spend a lot of time teaching them, nor putting them in a position where they can damage your application.
One potential form of accidental damage is where someone alters your query so it doesn't run correctly any more. So rather than expect users to modify your query correctly to adapt it to their needs, you provide a means of prompting users for the conditions that may change.
The example we used in the previous subsection assumed that the user was skilled enough to specify the state 'CT' or the sex 'F' or a cut-off birth date. If you want the query to work for any state and birth date, and either sex, you parameterize the query. That is, rather than putting the text 'CT' etc. into the query, you put placeholders that prompt the user for input.
SELECT * from Patients where Sex=[Please enter Sex: M or F]
and State=[Please enter state (2 letters)]
and DateOfBirth < [Please enter cut-off date: mm/dd/yyyy]
When a user runs this query, s/he is prompted for each value in turn.
Notice that each parameter is phrased as a prompt and is enclosed in square brackets.
The way that parameters work is that within a SQL query that is syntactically correct, any column name or expression in a where clause that Access does not understand is echoed back to the user by way of saying "Please help me out here".
Thus, if you typed the SQL
select FirstNam from Patients
when the correct name of the column is FirstName, Access simply echoes 'FirstNam'
in a pop-up window and presents a blank text box.
Therefore, if you phrase your parameter as a prompt or question, it becomes more self-explanatory for the user. The square brackets allow you to embed spaces in the prompt, making it more user-friendly.
However, if the user are prompted for a parameter when you didn't expect her/him to be, check your SQL for a spelling mistake in a column name. This circumstance can occur when you rename one or more columns for a table that was used in a previously working query.
Here, rather than rebuilding the query from scratch in Design View, it is faster to paste its SQL into a robust text editor like MS-Word, and replace the old names with the new, making sure that you use the "Whole Words Only" option to ensure that partial phrases are not altered accidentally.