Yale Center for Medical Informatics

Home. Research Training. Projects. Contact Details. Jobs/Opportunities.  

 

Examples of Current and Recent Projects

This section describes example projects that have involved Biomedical Informatics faculty, postdoctoral fellows, and graduate students. The projects are divided into four general areas:

 

clinical informatics projects

 

genome informatics projects

 

neuroinformatics projects

 

translational informatics projects

Clinical Computing Within Yale New Haven Medical Center 

Faculty: Perry Miller, MD, PhD, Richard Shiffman, MD, Cynthia Brandt, MD, MPH, Prakash Nadkarni, MD, Mark Shifman, MD, PhD, Peter Gershkovich, MD, Seth Powsner, MD, Andrea Benin, MD, Allen Hsiao, MD, Ryan O’Connell, MD, Nitu Kashyap, MD

In the early 1990s, Yale New Haven Hospital (YNHH) completed a 2-year process of installing CCSS, its hospital information system. CCSS provides mandatory physician order entry, results reporting, and other functions in support of clinical care. The medical center also implemented an Ambulatory Care Information System (Logician) in many hospital and school clinics. YCMI faculty and fellows collaborated closely on the planning and implementation of these systems, which provide the foundation for the ongoing development of an electronic medical record within the medical center. A more recent development, led by Prof. Shiffman, involves creating a community-wide health information exchange initially focused on the care of children with asthma. Starting in the fall of 2010, the Yale School of Medicine and Yale New Haven Health System (which includes Yale New Haven Hospital, Bridgeport Hospital, and Greenwich Hospital) have embarked on a joint $250+ million initiative to install the EPIC electronic medical record system, including a comprehensive clinical data repository, in all affiliated hospitals, clinics and practices. As a result, there are many opportunities for fellows to become involved in interesting clinical computing projects within the Medical Center, YNHHS, and the greater New Haven community.

Ongoing Collaboration with the VA Connecticut Healthcare System (VACHS) 

Faculty: Prakash Nadkarni, MD, Cynthia Brandt, MD, MPH, Luis Marenco, MD, Joseph Erdos, MD, PhD, Amy Justice, MD, PhD, Perry Miller, MD, PhD

Over the past two decades, there have been a variety of collaborations between faculty at the YCMI and faculty at the VA Connecticut Healthcare System (VACHS), which is based in nearby West Haven. One early project, directed by Prof. Joseph Erdos, Director of the VA Region 4 (East Coast) Clinical Data Warehouse, developed a pilot clinical data repository for the VACHS. It involved using a set of tools to extract patient data from the VA patient record system (written in the MUMPS programming language) and placing that data into a relational database to allow it to be queried and analyzed in a flexible fashion. One use of the data has been for provider profiling, where the model developed at Yale was adopted throughout the New England VA region. This project has also allowed us to explore a number of research issues involved in the design, implementation, and use of large clinical data repositories. Recent collaborations include: 1) Prof. Cynthia Brandt receiving support for a postdoctoral Medical informatics postdoctoral training program at the West Haven VA closely affiliated with Yale's NLM-supported training program, 2) several projects between Profs. Cynthia Brandt and Amy Justice (VACHS) focusing on informatics research in clinical epidemiology and health services research, 3) Prof. Cynthia Brandt leading the West Haven participation in a major national VA research project focused on the use of natural language/text mining techniques in the VA EMR, and 4) Prof. Perry Miller leading the effort to provide informatics support of biostatistics for a major 38,000 patient national GWAS study focused on schizophrenia and bipolar disease within the VA.

Clinical Decision Support Systems and Computer-Based Clinical Practice Guidelines  

Faculty: Richard Shiffman, MD, Perry Miller, MD, PhD, Sandra Frawley, PhD, Fred Sayward, PhD, and many clinical faculty at Yale

A longstanding research activity has involved the development of programs which bring computer-based advice to the practicing clinician. A related emphasis has been on computer-based knowledge processing for a spectrum of clinical practice guidelines. This project explores the acquisition and representation of guideline knowledge, as well as implementation and evaluation of operational guidelines. We have ongoing collaborations with national specialty societies, including the American Academy of Pediatrics, the American Academy of Family Physicians, and the American Academy of Otolaryngology. Professor Richard Shiffman is supported by the Agency for Healthcare Research and Quality to direct the GLIDES Project (GuideLines Into Decision Support), a high-profile demonstration seeking to develop consensus in the health care field around the use of clinical decision support (CDS) to promote safe and effective health care. The project engages relevant stakeholders including clinicians, provider organizations, guideline and quality measurement developers, and information technology professionals in the ongoing work to improve health care decision making using CDS systems. Professor Perry Miller is currently inaugurating a project to link computer-based clinical decision support to the national VA electronic patient record, starting with a collaboration with Dr. Robert Kerns (Psychiatry) at the West Haven VA focusing on the pharmacologic management of neuropathic pain.

Participation in Several Major National Genomic Initiatives

Prof. Mark Gerstein and his colleagues are involved in several large-scale national collaborations focused on aspects of genomics. Many CBB graduate students are participating in these projects. These projects all have translational implications and provide a spectrum of translational bioinformatics research opportunities.

The 1000 Genomes Project This is essentially NIH's marquee effort on personal genomics, the sequencing of individual people's genomes. The overall project aims to sequence thousands of individuals’ genomes to get a sense of their variability. Our pilot project on ~200 people was published in Nature (http://papers.gersteinlab.org/papers/1kgpilot). Dr. Gerstein’s group is involved in developing pipelines for analyzing the massive amounts of sequence information as part of the 1000 genomes consortium. Specifically, for the 1000 genomes project, we have developed an annotation pipeline that maps SNPs, indels and structural variations (SVs) on to protein coding-genes. While SNPs have been intensively studied, methods to identify indels and SVs are at a nascent stage. We are developing algorithms to identify indels and structural variations based on split-read, read-depth and paired-end mapping methods. The production phase of this project aims to provide a comprehensive view of human variation based on the genomes of 2,500 individuals. This will provide a valuable resource for GWAS studies and other types of translational research.

ENCODE In 2003, NIH initiated the pilot ENCODE project with the aim of elucidating all the functional DNA elements of 1% of the human genome. After the successful completion of the pilot phase, the project has been expanded to cover the entire human genome. As part of a multi-institutional collaboration, we are involved in annotating the human genome and developing methods for analyzing large-scale genomic experiments. In particular, we are working extensively on pseudogene identification and annotation of the human genome in collaboration with the GENCODE team members (http://www.gencodegenes.org/). We are also elucidating transcription factor binding sites and chromatin structure based on Chip-Seq experiments. To this end, we have developed Peak-Seq, an approach to identify peak regions in ChIP-seq data sets that correspond to sites of transcription factor binding (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2924752/?tool=pubmed). We continue to refine this method as well as develop new methods for extensive human genome analyses. From a translational perspective, this project is providing annotation that enables genomic correlation with disease.

Brainseq In collaboration with Prof. Nenad Sestan’s group at Yale, together with groups at USC and the Allen Brain Institute, we are analyzing large amounts of RNA-seq data to characterize the transcriptome of the human brain during development. The aim of this project is to create a comprehensive map of gene expression and to understand how the human brain changes throughout life by studying the transcriptome. We have already developed RSEQtools, a suite of tools for the analysis of RNA-Seq experiments. RSEQtools consists of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads, and segmenting that signal into actively transcribed regions. This project is providing a reference atlas of gene expression in different regions of the brain that will provide valuation information to help interpret neurological function and dysfunction.

Collaborative Translational Research in Statistical Genomics and Proteomics

Faculty: Hongyu Zhao, PhD, Mark Gerstein, PhD, Kei Cheung, PhD, Perry Miller, MD, PhD and many bioscience faculty in different departments at Yale

Prof. Hongyu Zhao’s laboratory, which is located immediately adjacent to the YCMI and to Pathology Informatics (on the same floor of the same building), focuses on a broad range of collaborative translational (disease focused) projects in statistical genomics and proteomics, in which numerous CBB students participate. Current research projects include: 1) next generation sequencing data analysis, 2) data integration methods in genome-wide association studies, 3) eQTL mapping in different organisms, 4) pathway-based genomics and proteomics analysis, 5) biological network reconstructions, e.g. transcriptional regulatory network and protein interaction networks, 6) statistical modeling and analysis of sparse networks, 7) statistical inference in ordinary differential equation systems, a nd 7) disease risk prediction from genomics data.

Analysis of Genetic Regulatory Networks in Cancer

Faculty: Mark Gerstein, PhD, Sherman Weissman, MD, Mark Rubin MD (Cornell), Michael Snyder, PhD (Stanford)

Prof. Gerstein and his collaborators have studied the structure of protein networks, both on a large-scale in terms of global statistics (e.g., the network diameter) and on a small-scale in terms of local network motifs. In particular, they have correlated network hubs with gene essentiality. Recently, they developed a number of tools to build and analyze networks derived from genes and also from literature citations. They have also investigated the dynamics of networks, i.e., how their topology changes over time. In addition, they have identified changing hubs and systematic patterns o f connectivity rewiring in the yeast regulatory network. One translational bioinformatics domain involves inferring regulatory networks in cancer.

Projects in Translational Informatics at the Intersection of Bioinformatics and Disease

Faculty: Paul Lizardi, PhD, Josephine Hoh, PhD, Hongyu Zhao, PhD, Perry Miller, MD, PhD

Several collaborations have focused on translational bioinformatics. One project involved analyzing high-density, genome-wide Affymetrix SNP data to help identify the genes involved in disease, starting with Age-related Macular Degeneration (AMD), and using pathway data to help in this analysis. A second project involved using similar microarray data to analyze comparative genome hybridization (CGH) in patients with cancer, focusing initially on analyzing regional “copy number” changes.

Computational Immunology with a Particular Focus on Computational Modeling

Faculty: Steven Kleinstein, PhD, Mark Shlomchik, MD, PhD, Ann Haberman, PhD, Michael Robek, PhD, David Schatz, PhD

Prof. Kleinstein has a longstanding collaboration with Prof. Mark Shlomchik and other immunobiology-related faculty that combines techniques from dynamic modeling, systems biology and bioinformatics to better understand the immune response. His group is particularly interested in the generation and selection of high affinity B lymphocytes in germinal centers during immune and autoimmune responses. As part of PRIME (Program for Research on Immune Modeling and Experimentation), they are developing mathematical models to elucidate the viral mechanisms of induction and subversion of type 1 interferon responses and maturation of dendritic cells by Category A-C viral pathogens.

Translational Cancer Research for Yale's SPORE in Skin Cancer

Faculty: Michael Krauthammer, MD, PhD

Dr. Krauthammer is co-director of the bioinformatics/biostatistics core of the Yale SPORE in skin cancer. In this role, he is supervising translational research collaborations with multiple researchers across the Yale School of Medicine. A special emphasis is the analysis of next generation sequencing data probing the melanoma genome, transcriptome, and epigenome. Another focus is the elucidation of kinase activation in cancer, as well as the molecular study of anticancer drug resistance. A final focus is data integration across multiple Omics modalities using Semantic Web technology.

Interdisciplinary Translational Research to Understand the Mechanisms Underlying Asthma

Faculty: Geoffrey Chupp, MD, Michael Krauthammer, MD, PhD, Kei Cheung, PhD, Perry Miller, MD, PhD

A major collaboration between Dr. Geoffrey Chupp, (Internal Medicine, Pulmonary & Critical Care) and informatics faculty (Drs. Michael Krauthammer, Perry Miller, and Kei Cheung) involves translational research focusing on the genes, proteins, pathways, and regulatory networks underlying asthma. An initial part of the project involves building a database to allow Dr. Chupp to collect data on patients with asthma, including clinical data, laboratory data, and high throughput data from analysis of diverse samples (e.g., blood, sputum, and biopsy tissue). His research staff will populate this database on an ongoing basis as asthma patients are seen over time. We will then work with Dr. Chupp and his team to help them perform whatever analyses they deem appropriate as they pursue their asthma-related translational research. For example, the goal of one current project is to contrast the gene expression profiles in the circulation and induced sputum across the spectrum of asthma severity and chitinase-3-like-1 genotypes and phenotypes. (Elevated levels of the chitinase-3-like-1 protein have been putatively linked to asthma and other disorders.)

The Yale Protein Expression Database & Related Informatics Projects

Faculty: Mark Shifman, MD, PhD, Kei Cheung, PhD, Kenneth Williams, PhD, Perry Miller, MD, PhD

The YCMI and Keck Biotechnology Center are involved in an ongoing collaboration to build, maintain, and refine the Yale Protein Expression Database (YPED) to help organize the processing of mass spectrometry proteomics data, which is being produced in increasingly large volumes by the Keck Center for many researchers at Yale and beyond. The database is designed to handle data from a variety of proteomics experiments including MALDI-MS based peptide/protein disease biomarker discovery, differential fluorescence 2D gel electrophoresis (DIGE), isotope-coded affinity tag (ICAT)/MS protein profiling, multidimensional LC/MS analysis of tryptic digests of whole cell and partially purified protein extracts (MudPIT), and isobaric tag for relative and absolute quantitation (iTRAQ). These projects are carried out in a variety of translational, disease-focused contexts.

Neuroinformatics as Part of the National Human Brain Project 

Faculty: Perry Miller, MD, PhD, Gordon Shepherd, MD, PhD, Michael Hines, PhD, Luis Marenco, MD, Kei Cheung, PhD

As part of the national Human Brain Project, we have developed informatics support of neuroscience research and computer-based modeling using the olfactory system as a pilot domain. Components o f this project currently include: 1) ORDB, a database of information about olfactory receptors, 2) NeuronDB, a database of information about the compartmental properties of different neurons, and 3) ModelDB, a database of models of neurons and neuronal compartments. The project provides a focus for exploring a flexible approach for designing bioscience databases, the EAV/CR design (Entity-Attribute-Value with Classes and Relationships). This design facilitates the flexible storage and retrieval of complex bioscience data and of the biological relationships between those data items. In addition, we are participating in a national NIH-based collaboration to build a Neuroscience Informatics Framework (NIF) that will allow researchers to flexibly search and query a wide spectrum of Internet-based neuroscience resources. One current translational focus is on neuronal modeling to explore issues related to Alzheimer's Disease.

Natural Language Processing and Text Mining in Biomedicine

Faculty: Michael Krauthammer, MD, PhD, Cynthia Brandt, MD, MPH, Perry Miller MD, PhD

A variety of projects focus on natural language processing and text mining in biomedicine. One set of projects are led by Prof. Krauthammer in Pathology Informatics with an emphasis on text and image mining from the biomedical literature. An example is the Yale Image Finder system, which pioneered the use of advanced image analysis capabilities to extract image text for improving biomedical document retrieval. Another set of projects are based at the West Haven VA, directed by Prof. Cynthia Brandt, and are part of the national VA CHIR (Consortium for Healthcare Informatics Research) project that focuses on natural language progressing and text mining of the VA’s extensive national EMR.

Exploring the Use of caBIG Technologies for Inter-institutional Data Sharing

Faculty: Michael Krauthammer, MD, PhD

Prof. Michael Krauthammer has been addressing a range of research, technical, and institutional issues in the context of sharing de-identified data about cancer specimens using caTissue. Yale is operating two caTissue instances, one private and one public. The public instance is a de-identified clone of the private instance, with a nightly batch upload of data. Dr. Krauthammer has made considerable efforts to iron out the governance issues that come to play when sharing data in this fashion. The Yale HIC ( IRB) was involved in amending protocols and consent documents discussing anonymous data sharing on the grid, and to craft an internal use agreements for accessing and editing caTissue PHI data. The Yale General Counsel office was involved in crafting inter-institutional use agreements for accessing Yale’s caTissue instance from non-Yale public or private research entities. Finally, Yale ITS was involved in crafting a security design review document that enumerates the various security issues involved with hosting our two caTissue instances.

Semantic Web Technologies in Biomedicine

Faculty:Mark Gerstein, PhD, Kei Cheung, PhD, Michael Krauthammer, MD, PhD, Martin Schultz, PhD

This research continues a longstanding set of collaborative activities that focus on the integrative analysis of genomic, proteomic, neuroscience, pathway, and drug data from many different perspectives. The activities involve the use of cutting-edge semantic web technologies to integrate diverse types of data from diverse types of web-accessible resources in several broad scientific domains.

Genomics research: While we started our work on the semantic integration of yeast genome data, we have recently expanded our work to include development of ontologies and rules for representing and integrating pseudogene data.

Neuroscience research: We have converted a set of neuroscience databases (SenseLab) into ontologies capturing knowledge about neurons and their cell membrane properties. We are also exploring to how link such neuroscience ontologies with other biomedical ontologies.

Translational research: We have recently embarked on using semantic web technologies to facilitate integration of high-throughput genomic data (e.g., microarray data and pathway data) with drug data in the context of cancer research.

 

 
  Back to Top of Page.