Introduction
ACT/DB = Adaptable Clinical Trials Database. Adaptability is quantified by the ability to create a study protocol in a new domain in a few days to a couple of weeks. (In a couple of hours if existing study elements can be fully reused.)
Production use at Yale and Vanderbilt. Standalone version currently deployed for training at Harvard (Dept of Biostatistics), client-server version will be deployed at Harvard by March '99 for use by the Cancer Genetics Network.
Credits:
Yale: Prakash Nadkarni, Cindy Brandt, Charles Lu, Perry Miller, all of the Center for Medical Informatics, with additional inputs from Lee Schacter, Yale Cancer Center.
Vanderbilt: John Fisk
ACT/DB is intended to store highly heterogeneous data, with tens of thousands of attributes, across a variety of medical specialties. Should expand to handle different medical domains without the need to continually expand the schema through the addition of more tables.
Hence mostly uses Entity-Attribute-Value representation (which is extensively used in HELP and the CPMC CDR). Only patient demographics data is stored conventionally.
Conventionally structured data is considerably simpler to inspect and analyze (all analytical packages insist on data organized as one column per parameter). Therefore one must create the illusion of conventionally structured data while using EAV representation internally.
Currently, designed to act primarily as a repository of the data, not optimized for decision support. Can support primary editing and entry of data as well as import of data from existing schemas (currently through import of delimited flat files).
Schema consists of dictionary (metadata) tables, data tables and external vocabulary tables.
ACT/DB uses strong typing (data segregated by datatype, so as to allow indexing by value). Actually, a combined index on Attribute ID and value, since value is meaningless in isolation.) Another advantage is that it makes it possible to manage BLOB data in the same way as other data.
Boolean, Enumerated and Ordinal attributes are special subclasses of integer data, where an integer value is associated with a descriptive phrase that is selected through a pull-down menu (combo box) or list box. Ordinal datatypes differ from enumerated in that the values can be compared for relative magnitude (less than or greater than) rather than simple equality. ("Type of Transfusion" - blood, plasma, RBC, WBC, Platelet etc. is an example of enumerated, while "Pain Severity" - absent, mild, moderate, severe - is an example of ordinal.)
An attribute may also be derived from an external vocabulary. (Each item in an external vocabulary is defined by at least two fields - a code and a name, optionally with a definition/description.) Vocabularies currently in ACT/DB - ICD/9, DSM-IV, COSTART. A generic vocabulary searcher is built in that can be invoked at run-time. (This vocabulary searcher is also used by administrators to search the dictionary tables when designing a new study.)
The actual attribute-value pairs are segregated into groups (based on logical association and co-occurrence in time ). Each group has zero, one or two time-stamps. A form (the unit of presentation) may consist of one or more groups.
Back end uses Oracle or MS-SQL server. (Very few vendor-specific features are used; only triggers have to be rewritten.) Front end originally used MS-Access exclusively. Currently, the front end is segregated into two components - an administrator/designer interface (still in Access) and a data entry/editing front end. For the latter, the existing MS-Access client will be superseded within a couple of months by a Web front end. The ACT/DB forms generator uses VBA (MS-Access code) to generate Web forms.
Forms are generated automatically from the dictionary tables. These may be edited and cosmeticized if necessary. (Web forms need less cosmeticization - through the table mechanism, it is possible to achieve much more precise alignment of fields and labels with less programming effort.)
Web forms are ASP (Active Server Pages) files which contain embedded client-side VBScript and/or Javascript. Most of the client-side script code is static. Code to handle Required Fields and Problem Notes is inserted dynamically when the ASP file is invoked. No ActiveX controls or Java applets are used in the generated form.
The form contains all the metadata that is needed to map the fields on the form to attributes within the database. This is done by using a special naming convention for the name of each field.
Certain sets fields can repeat an arbitrary number of times (e.g., there may be any number of prior surgeries). These repeating groups are presented to the user as tables. They are managed by giving all fields in a particular column the same ID (this is MS Internet Explorer-specific) but different instance suffixes in the name.
In addition, the form contains metadata (such as formulae that define the value of a field in terms of other fields, expressions that determine skip logic, data for hierarchical choice sets) that make it unnecessary for data requests to be made to the database server once the form is loaded into the client browser.
Standard (Study-Independent) Reports
Monitoring status of data entry
Tickler system (implemented at Vanderbilt) based on which forms are required to be filled in during particular periods within the study. The system uses the current date and the anticipated schedule for patients within a study to generate a report of what forms are due to be filled.
Chronological Report - all data on all parameters (or selected parameters) on a single patient.
Primarily used for export of data to analysis programs. Data is exported as one or more flat files.
Extract definitions (analogous to "Views") on the data are simulated and stored as metadata (extract definitions) rather than as SQL definitions defined as part of the database schema. This allows simplicity and flexibility at a modest cost in performance.
In general, it is not possible to extract all the data generated for a given study into a single flat file. Because groups of parameters are collected repeatedly and at varying frequencies, there will typically be one flat file for each group. (If certain groups always co-occur in a particular study, then these may be combined into a single extract.)
Unique feature is that the code generation is entirely driven by metadata. Therefore, if the internal representation of a particular attribute is changed by the database administrator from EAV to conventional, or vice versa, previously defined queries are unaffected. (We are in the process of using this front end to query a pilot data warehouse at the West Haven, CT, VAMC, which uses a combination of EAV and conventional data.)
Ad hoc query first identifies a subset of patients (based on arbitrary Boolean criteria). This subset may be stored with a specific name (and the data extracts described above may be created for this subset). Alternatively, one may choose to display one or more parameters for such a subset.
Currently query interface is in Access, will begin porting it to the Web shortly. Interface is being overhauled considerably as a result of feedback.
ACT/DB is the foundation of a shared special studies database for use by the Cancer Genetics Network (CGN), an initiative supported by the National Cancer Institute. A version will be housed at MGH and at least two CGN sites (using MS SQL Server).
Mapping definitions can be created for later reuse, if data in the same format is to be repeatedly imported. The import allows a validation phase and an "undo" option if errors are encountered.
Ranges can be specified for fields beyond which values that are entered will not be accepted (absolute ranges). Ranges can be also specified for fields beyond which warnings (alerts) will be generated- but values can be accepted if the user so decides.
The idea is to maximize the reuse of forms across studies and minimize the creation of study-specific forms. For a given study, some fields in a standard form (e.g., clinical chemistry) may be mandatory, while others may not be. These are defined in the metadata, and are dynamically indicated by a yellow field background at runtime.
Conditional enabling/disabling of fields in a form based on the contents of other fields. (Values can be tested with equality/relational/range operators).
A field in a form is dynamically populated with a pull-down menu whose choices change depending on the value selected in another ("parent") choice set. For example, the parent choice set may have "category of tumor", while the child choice set will dynamically contain a list specific to the tumor category.
Records changes, deletions. Each change is stored in a table that records old values, new values, date/time of change, and user who performed change. This turns to be very easy to implement through Web forms, by monitoring changes to the contents of fields.
The analog of "Post-It" notes, allowing two-way communication between data entry person and study supervisor/administrator by tagging individual fields in a form with one or more notes if there are questions. These are visually indicated by red text (if a problem is unresolved) or cyan text (if a problem has been resolved but the notes have not been removed).
User-specific Security through Metadata
Only studies that a user is entitled to see are displayed after login.
For individual studies, the data is either editable or read-only based on the
user's privileges for that study.
Based on the Study specification, the names of subjects may or may not be
displayed on screens. For data requiring total confidentiality, subjects are
identified through a Study ID, with race/sex/approximate Date of Birth to
partially assist validation; the Study ID incorporates check digits to minimize
the possibility of entering an incorrect number.
The previous metaphor was to design from bottom up - define the choice sets, then the questions, then groups and finally forms. We make extensive use of the TreeView ActiveX control.
Linkage of Attributes to External Vocabularies (e.g., UMLS)
This is an important goal. However, such linkage is not.possible for the majority of attributes because these tend to be idiosynchratic and lack exact counterparts within the UMLS. For example, there are numerous questions in an "Activity Questionnaire" related to the ability to perform various kinds of activity/work indoors and outdoors. Ditto for a large number of psychiatry questionnaires.
"Version control". The idea is to maximize reuse of existing questions, with minor modifications if possible, rather than creation of new questions having almost identical semantics. Therefore, for example, casual modifications of the definitions of a question or choice set are prevented if such questions are already in use in a study that is not under the control of the current study designer. (Such modifications must be performed by the system administrator). sharing library items.
Required in the context of the CGN. We will shortly be supporting multiple installations of ACT/DB - at Johns Hopkins University and UC Irvine. Therefore it is necessary to handle merging of data from the peripheral sites into the central database. This will be handled through replication technology (We will be using MS SQL Server's replication mechanisms) Each item (both in the data as well as the metadata) will be tagged with the ID of the center where it was generated. A major challenge will be to control the metadata growth very tightly so that duplicate definitions of attributes, for example, do not proliferate.