Adjunct Faculty, Yale Center for Medical Informatics
Email: prakash (dot) nadkarni at yale.edu
My hobby (which also happens to be my job) is working with biomedical
databases of all kinds.
I have an abiding interest in Metadata-driven Software Systems,
which exemplify creative laziness - that is, doing things right the first time so
that you don't have to tweak your software over and over again.
A family of databases that rely critically on metadata are Entity-Attribute-Value (EAV) databases,
which are used in domains where the number of potential descriptors
(attributes) describing an object is a couple of orders of magnitude greater
than the actual number of descriptors for a given object. For example, when
dealing with patient data across all clinical specialties, the number of
history elements,symptoms, clinical examination findings, lab tests and so on
ranges in several tens of thousands, and this number is constantly growing.
Yet, for a given patient, not more than a few dozen types of positive or
significant negative findings are actually relevant. That is, the data is
highly sparse, and a set of conventional relational tables, with one finding
per column, would result in much wasted space, because most columns would be
null. In the EAV approach, one stores only non-null findings in a table
containing three types of information: the Entity (the patient, the date/time
the finding was recorded), the Attribute (i.e., the name of the finding) and
the Value of the finding.
(Shameless Plug: My 2011 book on metadata-driven systems in Biomedicine, published by Springer,
is available on Amazon, with excerpts also available on Google Books.)
TrialDB, an EAV database for management of Clinical Studies Data that is
copyrighted by myself and my colleagues Cindy Brandt and Luis Marenco (though it is open-source
freeware), is described on the TrialDB Home Page. This page has
an FAQ, and links to the ftp site, online documentation (also ftp-able) and the
demo site where you can try it out..
I also dabble in information retrieval (a fancy phase for text processing).
This is an offshoot of my database interests: a large component of biomedical
databases consists of narrative text (which captures nuances that coded text
cannot): examples are discharge summaries and operative notes. I'm looking at
ways to optimize the searching process by indexing the content based on
recognition of concepts in controlled biomedical vocabularies (I mainly play
with the National Library of Medicine's Unified Medical Language System), and
at ways of integrating text search with conventional database search. I've begun
to explore Natural Language Processing and (as an offshoot of NLP)
machine-learning techniques.
In the past, I've worked in the area of genome informatics as well as
parallel computation in molecular biology and genetics.
Presentations:
The following links point to the contents of presentations that should be of general interest to medical informaticians.
- Clinical Data Warehousing presented at
AMIA Fall Symposium, Orlando,
FL, Nov 8 1998
- ACT/DB: An Infrastructure for Clinical Trials Data
Management Columbia University, Jan 21 1999
- The EAV/CR Physical Data Model for Heterogeneous
Scientific Databases Human Brain Project Annual Meeting, NIH,
Jun 5,1999
- Understanding and Implementing the EAV
Database in the General Clinical Research Center. National GCRC Meeting, Baltimore, MD,
April 13, 2002. The URL above is a converted PowerPoint presentation. A
detailed explanatory paper can be found below.
- An Introduction to EAV
systems: National GCRC Meeting, Baltimore,
MD, April 13,2002.
- Informatics Support of Data management for
multi-centric clinical studies: Integrating clinical and genetics/genomic data
American College
of Medical Informatics, Fort
Lauderdale, Florida,
March 2003.
- Database Representation of Phenotype Data: Issues
and Challenges Human Genome Variation Society, American Society
for Human Genetics Meeting, Los
Angeles, CA , November
4, 2003.
- Metadata-driven
systems in Biomedicine. Henry Ford Health System, Detroit, 2008
- Implementing
Clinical Decision Support (With Hemant Shah, MD). Henry Ford Health System,
Detroit, 2009.
- Using Electronic medical records for
clinical research: Issues and Challenges University of Iowa, Sept
2012..
Here is a list of selected recent publications. (Some of the papers are
downloadable as MS-Word files plus figures, compressed into zip files. See the
hyperlinks at the bottom of the publications list..)
- Nadkarni PM. Management of
Evolving Map Data: Data Structures and Algorithms Based on the Framework
Map Genomics, (1995) 30:565-573. Abstract
- Nadkarni PM, Cheung K-H.
SQLGEN: An environment for rapid client-server database development.
Computers and Biomedical Research (1995) 28:479-499. Abstract
- Nadkarni PM, Montgomery KM,
Leblanc-Stracewski J, Krauter K. CONTIG EXPLORER: Interactive Exploratory
Contig Assembly. Genomics (1996) 31:301-310. Abstract
- Nadkarni PM, Cheung K-H,
Castiglione C, Miller PL,
Kidd KK. DNA Workbench: a database for support of regional chromosomal
mapping. Journal of Computational Biology, (1996) 3 (2), 319-329. Abstract
- Nadkarni PM. Mapdiff: an
algorithm to report the differences between two genomic maps. Comput
Applic Biosci, (1997) 13 (3) 217 - 225. Abstract
- Cheung, K-H, Nadkarni PM,
Silverstein S, Miller PL,
Kidd KK. PhenoDB: A database for the storage and analysis of pedigree and
population genetic data. Computers in Biomedical Research (1996) 79:
327-337. Abstract
- Nadkarni PM. Concept Locator:
A Client-Server Application for Retrieval of UMLS Metathesaurus Concepts
Through Complex Boolean Query. Computers in Biomedical Research (1997)
30:323-336. Abstract
- Nadkarni PM. QAV: Querying
Entity-Attribute Value Metadata in a Biomedical Database.Computer Methods
and Programs in Biomedicine (1997) 53 93-103. Abstract
- Nadkarni PM. Mapmerge:
merging genomic maps. Bioinformatics (1998) 14(4), 310-316. Abstract
- Nadkarni, P, Brandt, C,
Frawley, S M, Sayward, F, Einbinder, R, Zelterman, D, Schacter, L Miller,
P L. Managing attribute-value clinical trials data using the ACT/DB
client-server database system. Journal of the American Medical Informatics
Association (1998) 5(2) 139-151. Abstract
- Cheung, KH, Nadkarni PM, Shin
DG. A metadata approach to query interoperation between molecular biology
databases. Bioinformatics (1998) 14(6) 486-497. Abstract
- Nadkarni PM. CHRONOMERGE: An
Application for the Merging and Display of Multiple Time-Stamped Data
Streams. Computers and Biomedical Research (1998) 31 451-464. Abstract
- Nadkarni, P, Brandt, C. Data
Extraction and Ad Hoc Query of an Entity-Attribute-Value Database. Journal
of the American Medical Informatics Association (1998) 5(6) 511-527. Abstract
- Nadkarni PM, Marenco L, Chen
R, Skoufos E, Shepherd G, Miller P. Organization of heterogeneous
scientific data using the EAV/CR representation.Journal of the American
Medical Informatics Association 1999 Nov-Dec;6(6):478-93. Abstract
- Stein HD, Nadkarni P, Erdos
J, Miller PL Exploring the degree of concordance of coded and textual data
in answering clinical queries from a clinical data repository. J Am Med
Inform Assoc 2000 Jan-Feb;7(1):42-54. Abstract
- Nadkarni PM, Brandt C,
Marenco L. WebEAV: Automatic Metadata-Driven Generation of Web Interfaces
to Entity-Attribute-Value Databases. J Am Med Inform Assoc
2000;7(4):343-56 Abstract
- Roland S. Chen, Prakash
Nadkarni , Luis Marenco, Forrest Levin , Joseph Erdos and Perry L.
Miller.Exploring Performance Issues for a Clinical Database Organized
Using an Entity-Attribute-Value Representation. J Am Med Inform Assoc
2000; 7(5):475-487 Abstract
- Chen RS and Brandt CA. UMLS
Concept Indexing for Production Databases: A Feasibility Study. Journal
of American Medical Informatics Association, 2001 8: 80-91 Abstract
- Mutalik PG, Deshpande AM,
Nadkarni PM. Use of General-purpose Negation Detection to Augment Concept
Indexing of Medical Documents: A Quantitative Study using the UMLS.
Journal of American Medical Informatics Association, 2001 8: 598-609 Abstract
- Nadkarni PM: An Introduction
to Information Retrieval: Applications in Genomics. The Pharmacogenomics
Journal, 2002 2(2) 96-102. Abstract
- Deshpande AM, Brandt CA,
Nadkarni PM. Metadata-driven Ad hoc Query of Patient Data: Meeting the
Needs of Clinical Studies. Journal of the American Medical Informatics
Association. 2002; 9(4) 369-382. Abstract
- Fisk JM, Mutalik PG, Levin
FW, Erdos J, Taylor C, Nadkarni PM. Integrating Query of Relational and
Textual Data in Clinical Databases: a Case Study. Journal of the American
Medical Informatics Association. 2003: 10(1) 21-38..Abstract
- Nadkarni PM, Sun K, Wiepert
M. Designing and Implementing Special-Purpose Databases: Lessons from the
Pharmacogenetic Network. Pharmacogenomics 2002 3(5): 687-96. Abstract
- Nadkarni PM. The challenges
of recording phenotype in a generalizable, computable form (Perspective).
The Pharmacogenomics Journal 2003 3(1) 8-10.
- Deshpande AM, Brandt CA,
Nadkarni PM. Temporal Query of Attribute-Value Patient Data: Utilizing the
Constraints of Clinical Studies. International Journal of Medical
Informatics 2003; 70 (1): 59-77. Abstract
- Marenco L, Tosches N, Crasto
C, Shepherd G, Miller PL, Nadkarni PM Achieving Evolvable Web-Database
Bioscience Applications Using the EAV/CR Framework: Recent Advances.
Journal of the American Medical Informatics Association. 2003: 10(5). Abstract
- Brandt CA, Gadagkar R, Rodriguez C, Nadkarni PM. Managing Complex
Change in Clinical Study Metadata. Journal of the American Medical
Informatics Association. 2004 11(3) 380-91. Abstract
- Holford M, Li N, Nadkarni P, Zhao H. VitaPad: visualization tools for analysis of
pathway data. Bioinformatics: 2004, 21 (8) 1596-1602. Abstract
- Marenco L, Wang TY , Shepherd G, Miller PL, Nadkarni PM QIS: A Framework for Biomedical
Database Federation. Journal of the American Medical Informatics Association. 2004 11(6) 523-34.. Abstract
- Nadkarni PM and Brandt CA. The Common Data Elements for Cancer Research: Remarks
on Functions and Structure. Methods of Information in Medicine, 2006;45(6):594-601. Abstract
- Nadkarni PM, Wiepert M. Translating pharmacogenomics discoveries into clinical practice: the role of curated databases.
Pharmacogenomics. 2005 Jul;6(5):451-4. Pubmed (free article)
- Dinu V, Nadkarni P. Guidelines for the effective use of entity-attribute-value modeling for biomedical databases.
Int J Med Inform. 2007 Nov-Dec;76(11-12):769-79.
Pubmed (free article)
- Nadkarni PM, Miller RA. Service-oriented architecture in medical software: promises and perils.
J Am Med Inform Assoc. 2007 Mar-Apr;14(2):244-6.
Pubmed (free article)
- Marenco L, Wang R, Nadkarni P. Automated database mediation using ontological metadata mappings.
J Am Med Inform Assoc. 2009 Sep-Oct;16(5):723-37.
Pubmed (free article)
- Nadkarni PM, Marenco LN. Implementing description-logic rules for SNOMED-CT attributes through a table-driven approach.
J Am Med Inform Assoc. 2010 Mar-Apr;17(2):182-4.
Pubmed (free article)
- Nadkarni PM, Marenco LN. Implementing description-logic rules for SNOMED-CT attributes through a table-driven approach.
J Am Med Inform Assoc. 2010 Mar-Apr;17(2):182-4.
Pubmed (free article)
- Nadkarni PM, Darer JD.Migrating existing clinical content from ICD-9 to SNOMED.
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):602-7.
Pubmed (free article)
- Nadkarni PM. Drug safety surveillance using de-identified EMR and claims data: issues and challenges.
J Am Med Inform Assoc. 2010 Nov-Dec;17(6):671-4.
Pubmed (free article)
- Nadkarni PM, Darer JD. Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study.
BMC Med Inform Decis Mak. 2010
Pubmed (free article)
- Richesson RL, Nadkarni P. Data standards for clinical research data collection forms: current status and challenges.
J Am Med Inform Assoc. 2011 May 1;18(3):341-6.
Pubmed (free article)
- Chapman WW, Nadkarni PM, Hirschman L, D'Avolio LW, Savova GK, Uzuner O.
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3.
Pubmed (free article)
- Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51.
Pubmed (free article)
- Nadkarni PM, Kemp R, Parikh CR.
Leveraging a clinical research information system to assist biospecimen data and workflow management: a hybrid approach.
J Clin Bioinforma. 2011 Aug 25;1:22.
Pubmed (free article)
- Morse RE, Nadkarni P, Schoenfeld DA, Finkelstein DM. Web-browser encryption of personal health information.
BMC Med Inform Decis Mak. 2011 Nov 10;11:70.
Pubmed (free article)
- Shah H, Allard RD, Enberg R, Krishnan G, Williams P, Nadkarni PM.
Requirements for guidelines systems: implementation challenges and lessons from existing software-engineering efforts.
BMC Med Inform Decis Mak. 2012 Mar 9;12:16.
Pubmed (free article)
- Nadkarni PM, Parikh CR. An eUtils toolset and its use for creating a pipeline to link
genomics and proteomics analyses to domain-specific biomedical literature.
Pubmed (free article)
Book Chapters
Shepherd, G.M., M.D. Healy, M.S. Singer, B.E. Peterson, J.S. Mirsky, L.
Wright, J.E. Smith, P.M. Nadkarni, & P.L. Miller. Senselab: a project in
multidisciplinary, multilevel sensory integration. pp. 21-56 of
Neuroinformatics: An Overview of the Human Brain Project, ed. S.H. Koslow &
M.F. Huerta. Lawrence Erlbaum Associates, Inc. Mahwah, NJ:
1997.
Prakash Nadkarni., Jason Mirsky, Emmanouil Skoufos, Matthew Healy, Michael
Hines, Perry Miller and Gordon Shepherd. Modeling Heterogeneous Data on the
Nervous System (Book Chapter) in Bioinformatics Databases ed. Stanley I. Letovsky.
Kluwer Academic Publishers, Dordrecht
, Netherlands.
pp. 38-51
Books
Prakash Nadkarni:. Metadata-driven Software Systems in Biomedicine.
Springer, 2011.
Prakash Nadkarni: Parallel Programming with Linda: An Advanced
Introduction.
Linda, conceived by David Gelernter and initially implemented by Nick
Carriero, both of Yale University, is a "mini-language" constructing
of just 4 constructs that is embedded in a conventional language (e.g., C or
FORTRAN) to give it parallel capabilities. While conceptually very simple, its
reliance on a pre-processor (that must be hand-sculpted for the language in
which it is to be embedded) has limited its widespread use, by contrast with
MPI, which though somewhat more difficult to use, depends purely on a
subroutine library. This book was written back in 1992, and also gives an
introduction to parallel programming. It can be downloaded by clicking here.
Downloadable Publications
The downloadable zip files linked to below typically contain more than one publication.
Refer to the numbers in the list above. Please note: figures are generally
bundled with the MS-word file, or separately, but some figures may be missing).
Many of the publications have originally appeared in the Journal of the American Medical Informatics
Association, from where you can get excellent content related to Medical
Informatics. JAMIA publications that are more than three years old are also
freely downloadable (as PDFs) from NCBI's PubMed
Central Site
Click on the following links to download full text of publications of
interest :
publications 1, 3 and 4;
publications 7 and 8
publication 9
publications 10, 12 and 13
publication 14
publications 16 and 17 .
publication 18
publication 19
publication 20
publication 21
publication 22
publication 23
publication 24
publication 25
publication 26
Home