Our Awards


Research Awards


We believe these reports reflect an unprecedented collaborative effort among researchers at MCRF, UWSMPH, MCW, and UW‑Milwaukee.  It demonstrates that it is possible to create a medical research infrastructure to build and test a unique scientific platform capable of predicting disease susceptibility and treatment response to a high degree of accuracy, and to make it widely available to scientists across the country.  In a short 2 years, WGI built a unique scientific platform for discovery in multiple disease areas and treatment methods capable of improving its competence by expanding research cohorts, increasing the number of genetic markers, increasing the amount of environmental and clinical data, and improving the computational algorithms.


Development of a Predictive Algorithm for Age-related Macular Degeneration


Principal Investigator: Murray Brilliant, PhD

Collaborators: David Page, PhD & Joe Carroll, PhD


Age-related Macular Degeneration (AMD) is the most common cause of vision loss in the developed world. Current treatment strategies target angiogenesis and are used only after the disease process has threatened to cause significant permanent damage to the neurosensory retina.  The objective of this study was to determine a predictive algorithm for those at high Relative Risk (RR) for AMD based on previously identified genetic markers, age, sex, environmental factors and advanced retinal imaging.  AMD risk increases with age, with ~25% of people over 85 showing signs of AMD (AMD cohort).  Therefore, the logic employed was that the oldest people without signs of AMD (elderly non-AMD cohort) will be enriched for genetic markers that are not associated with AMD, compared with those who get AMD at an earlier age (AMD cohort).  A third cohort included those who were younger than the age at which most AMD occurs (pre-AMD age cohort).  Individuals from this last cohort were identified and sorted according to the predictive algorithm into those at high risk and those at low risk for AMD.  Individuals from both extremes were subjected to an adaptive optics eye exam to look for the earliest signs of AMD.  This new criteria may be used to refine the model in future studies.  We were able to develop a highly predictive algorithm for AMD.  The results of this study formed the basis of an NIH grant application, submitted for the July 19, 2012 deadline, Implementation of Genomic Medicine for Age-related Macular Degeneration” in response to



Exome Sequencing to Identify Coding Variants for Myocardial Infarction

Principal Investigator: Ulrich Broeckel, MD

Collaborators: Deanna Cross, PhD & David Page, PhD


The purpose of this study was to identify causal variants associated with myocardial infarction (MI) in a subset of participants of the Personalized Medicine Research Project (PMRP).  The risk of MI is determined by genetic factors.  The proposed experiments were designed to identify coding variants in genes by direct sequencing of all exomes of the human genome.


PMRP samples were comprehensively analyzed and a subset of relevant and potential functional variants was identified. Annotation of the gene function identified novel disease risk mechanisms contributing to the genetic risk of MI.

Our sequencing experiments and the subsequent analysis describes for the first time a subset of previously unlinked genes that contribute to the risk of MI in a subset of individuals of the PMRP cohort.  These individuals are characterized by a “low” risk profile based on standard well-recognized risk factors such as diabetes, hypertension, or smoking.  The findings of our study link for the first time hemidesmosomes with the risk of MI.


Improving the Predictive Modeling of Atrial Fibrillation/Flutter (AF/F) and Its Outcomes

Principal Investigator: Humberto Vidaillet, MD

Collaborators: Peggy Peissig, MBA; Percy Karanjia, MD; Bess Berg, MS; & David Page, PhD


The objective of this study was to test, as a proof of concept, that machine learning (ML) could be effectively applied to a common, but complex, health condition to predict important outcomes that have potential implications for patient care processes.  Atrial fibrillation (AF) and atrial flutter (F) are the most common cardiac arrhythmias.  Both arrhythmias are associated with an increased risk of death, stroke, heart failure, disability, and increased utilization of health care resources.  Atrial fibrillation/flutter (AF/F) was selected as the health condition of interest because it is the most common cardiac rhythm abnormality, because AF/F patients are at a substantially increased risk of major adverse outcomes including stroke and death, and because effective and relatively low cost anticoagulation treatments are available that would prevent an estimated 75% of AF/F-related strokes.  Significant technical challenges were encountered and overcome in activities such as AF/F phenotyping, electronic capture of dated echocardiograph results and electronic phenotyping of stroke.  These efforts made available for analyses a study cohort of N=8,054 records, including 3,762 AF/F records.  Model results summarized by area under ROC curve indicated positive performance for predicting AF/F onset (0.675); stroke (0.601) and 1- and 3-year mortality (0.741 and 0.761).  To our knowledge this work represents the first study designed to predict onset of AF/F and associated stroke and death using EHR-derived data and ML tools in high risk patients.  These results are highly promising for future development of more robust models for AF/F and for extending the significant advancements made in this study in phenotyping and in the development of novel ML techniques to important clinical conditions where more accurate prediction of disease onset and course leads to better outcomes for patients.


Integrating Genomic Data into a Computational Model for Improved Breast Cancer Diagnosis


Principal Investigators: Elizabeth Burnside, MD & David Page, PhD

Collaborators: Peggy Peissig, MBA & Adedayo Onitilo, MD


Offering accurate and cost-effective delivery of breast cancer screening to the ever-increasing number of women in need presents a great challenge because it demands both high sensitivity and high specificity. Clinical information increases the accuracy of tests such as mammography and providing patient specific probability estimates can help less experienced physicians improve to the level of experts.  The purpose of this study was to establish a multi-relational dataset incorporating patient specific genomics data, mammography findings and clinical/demographic risk factors that can improve the risk prediction accuracy of our Bayesian model.  Explore the conditional dependence relationships between demographic, mammographic, and genomic features that are predictive of breast cancer to optimize the model.


With the 24 targeted single-nucleotide polymorphisms (SNPs) on Marshfield population, we tested the prediction power of one statistical machine learning algorithm Tree Augmented Naïve Bayes (TAN), using 10-fold cross validation.  We finally combined the 24 targeted SNPs and the 50 Breast Imaging-Reporting and Data System (BI-RADS) features and applied TAN with 10‑fold cross validation, which yielded a receiver operating characteristic (ROC) curve with area under the curve (AUC) 0.731. With a 2-sided paired T-test, we tested the 10 AUC-ROC from the experiments on BI-RADS features and the 10 AUC-ROC from the experiments on the combined features, the P-value is 0.021.  Therefore, the genetic markers can significantly help improve breast cancer risk prediction based on the mammogram features.


Investigation of Genomic Association between Heart Failure & Diabetes Mellitus

Principal Investigator: Nancy Sweitzer, MD, PhD

Collaborators: Peggy Peissig, MBA; Orly Vardeny, PharmD & Zhan Ye, PhD


This study took advantage of two well-developed phenotypic and genetic data repositories: The Penn Heart Failure Study, in which the UW Heart Failure Program has participated since 2005, and the Marshfield Clinic Personalized Medicine Research Project.  Merging these two resources yielded a large number of appropriate subjects for improved understanding of genomic interactions between heart failure (HF) and diabetes mellitus (DM).  In addition to significant public health implications, results will provide important data to guide resource allocation and potentially lead to significant cost savings.  Great potential exists for therapy that may reduce incidence of new DM in the HF population based on improved understanding of interactions between genetic polymorphisms and pharmacologic responses. 

There was no association between the defined common genetic single nucleotide polymorphisms (SNPs) and risk of new diabetes in our population, although the number of new DM cases was small. There was, however, a significant association between all the insulin resistance SNPs identified and progressive left ventricular enlargement. Detailed analysis of the dataset was incomplete at study end, but will continue in the Principal Investigator’s lab.


Membrane Metaloproteinase-9 (MMP-9) Genotype and Aortic Aneurysm

Principal Investigator: Jay Yang, MD, PhD

Collaborators: Sijan Wang PhDMartha Wynn MD; Charles Acher, MD & Peggy Peissig, MBA


The incidence of aortic aneurysm, an abnormal ballooning of the vascular wall of the aorta, has been increasing over recent years and is now thought to affect 8% of men over age 60. Currently published association studies between membrane metalloproteinase (MMP) -9, a suggested biomarker, and aortic aneurysm do not take into consideration the effects of the various MMP-9 genetic polymorphisms on the complex function of this enzyme. The goal of this project was to determine whether MMP-9 genotype can serve as a biomarker for abdominal aortic aneurysm (AAA), which is a devastating disease associated with high morbidity and mortality. Identification of a biomarker for this disease will greatly facilitate treatment decisions and impact the lives of patients with this disease. 


A step-wise logistic regression analysis with 6 functional SNPs where weakly contributing confounds were eliminated using Akaike information criteria gave a final 2 SNP (D165N and p-2502) model with an overall odds ratio of 2.45 (95% confidence interval 1.06, 5.70). The combined approach of direct experimental confirmation of the functional consequences of MMP-9 SNPs and logistic regression analysis revealed significant association between MMP-9 genotype and AAA.


Risk Modeling Post-Hospitalization Venothromboembolism in a Population-Based Cohort

Principal Investigator: Steve Yale, MD

Collaborators: Mark Craven, PhD; Deanna Cross, PhD; Stephen Talsness, BA; Peggy Peissig, MBA & Joseph Mazza, MD


Venothromboembolism (VTE) is estimated to affect 30 million persons in the USA with an annual incidence of 1.17 per 1,000.  There is emerging interest in VTE and to better identify and clinically manage patients that would benefit from medical intervention prophylactically.  However, optimal management strategies are conditional upon the availability of effective risk categorization and assessment tools, which are presently underdeveloped.  This study provides a unique opportunity to build on earlier Marshfield VTE modeling efforts using robust data analysis methods and an expanded clinical data set to develop a successful post-hospitalization VTE risk model.  Efforts to date have not produced a mechanism to identify patients at risk for post-hospitalization VTE and subsequent prophylactic intervention.  Development of models that identify at-risk individuals would provide important translational value.

We considered the task of predicting which patients are most at risk for post-hospitalization venothromboembolism (VTE) using information automatically elicited from an electronic medical record (EMR).  Given a set of cases and controls, we used machine-learning methods to induce models for making these predictions.  Our empirical evaluation of this approach offers a number of interesting and important conclusions.  We identified several risk factors for VTE that were not previously recognized.  We showed that machine-learning methods are able to induce models that identify high-risk patients with accuracy that exceeds previously developed scoring models for VTE.  Additionally we showed that, even without prior knowledge about relevant risk factors, we are able to design accurate models for this task. We demonstrated that our learned model is superior to coded risk assessment tools, Sanofi and Chicago risk assessment questionnaires, currently in use.  Future directions include additional validation and application of the assessment tool in clinical practice.


Mining textual data in the EMR for Prediction of Atrial Fibrillation/Flutter (AFF) Through Application of Machine Learning

Principal Investigator: Rajesh Chowdhary, PhD

Collaborators: Bess Berg MS; Eneida Mendonca PhDPercy Karanjia, MD; Romel Garcia-Montilla MDDavid Page, PhD


Electronic Health Records (EHR) contain large amounts of useful information that could potentially be used for building models for predicting onset of diseases.  In this study, we investigated the use of free-text and coded data in Marshfield Clinic EHR, individually and in combination for building machine learning based models to predict the first ever episode onset of atrial fibrillation and/or atrial /flutter (AFF).  In this study, we extended Berg’s (2010) AFF predictive modeling on coded-data by evaluating the usefulness of free-text data in modeling AFF onset prediction.  We explored the suitability of using text data for building accurate machine learning (ML) models for predicting the onset of AFF. As a proof of concept, we developed and tested ML-based text mining approaches for predictive modeling of AFF onset using Marshfield Clinic's textual EMR data.  We extracted the textual EMR data associated with the patients we used in our preliminary study (Berg 2010, Berg et al. 2010) and trained ML models with textual features alone in order to test the accuracy of such models in predicting the onset of AFF. 

On text-based datasets, the best performing model achieved an F-measure of 60.1%, while applied exclusively to coded data and a combination of textual and coded data the best performing models achieved comparable performance.  The study results attest to the relative merit of utilizing textual data to complement the use of coded data for disease onset prediction modeling.


Sustained Community Engagement in Genetics and Genomics Research to Improve Health and to Increase Health Equity

Principal Investigators: Aaron Buseh PhD, MPH and Sandra Underwood, PhD, RN


The incidence of complex diseases with co-morbidity is high in racially/ethnically diverse, low income, urban populations.  Adding to this statistic is the lag time of new medical discoveries reaching these high risk populations.  Essential to bridging the gap in the translation of genomics to health and to increasing health equity is the engagement of communities of color in genetic and genomic research (Bonham et al 2009).  The purpose of this study was to determine effective ways to engage members of diverse urban communities in genetic and genomics research designed to improve health and achieve health equity.

This community-based study provides the opportunity for researchers, clinicians and policy makers to look at perceptions of African Americans and Black African immigrants/refugees regarding participating in genetics/genomic research.  It provides information into aspects of a complicated area of science: balancing ethical ramifications with benefits of science.  Researchers targeting these groups for their participation in genetic studies should seriously consider these concerns as they are likely to impact recruitment and retention in genetic and genomic studies.  Participants were leery about participating in genetics and genomics studies including biobanks.  Therefore, strategies and policies designed to reassure and protect these populations with input from both communities are essential if African Americans and Black African immigrants/refugees are to be effectively engaged and accrue the benefits from these scientific advances.


Infrastructure Based Project Awards


Feasibility of Modular High Throughput Electronic Phenotyping

Principal Investigator: Peggy Peissig, MBA


To simplify the process of developing high throughput phenotypes from the electronic health record (EHR), we developed the concept of “phenotype widgets”, which are reusable parameter-driven functions that encapsulate the gathering and defining of clinical attributes to create atomic-level phenotypes.  Once validated, these phenotype widgets can be systematically combined to create more complicated disease widgets and used by researchers to investigate EHR populations.

Our research demonstrated the feasibility of decomposing complex phenotyping algorithms by creating nearly 100 atomic–level generic widgets that utilize parameters and perform various data management functions when querying a data source.  Our testing has demonstrated the ability of researchers to combine the simplified widgets to create more complicated disease widgets (e.g. cataracts, diabetes).  We also identified important design recommendations and features that should be considered when developing phenotyping graphical user interfaces (GUIs).


Informatics Architecture


Principal Investigator: Simon Lin, MD

Collaborators: Justin Starren, MD, PhD & Laurel Verhagen, BS

The informatics architecture at Marshfield Clinic was designed primarily to support internal (Marshfield Clinic only) genetic studies, rather than act as a shared resource for multiple institutions. This infrastructure award allowed for the major leap from Marshfield Clinic to WGI through the creation of a single integrated repository of de-identified clinical and genomic data. To allow for Individual study data sets to be extracted from this master repository, Investigators created the following:

    1. Research Data Warehouse: Research view of identified clinical data.
    2. WGI Repository: Integrated clinical and genotypic data.
    3. Trusted Broker: Identifier management system providing reversible and non-reversible encryption.
    4. Web Portal: Summarized data and tools for collaborating researchers.


Statistical and Computational Analysis of Infrastructure


Principal Investigator: David Page, PhD


This WGI grant of $31,629 funded a graduate research assistant, Eric Lantz, for an academic year.  In addition to his salary, it paid his Biostatistics and Medical Informatics (BMI) Department computing fees at UW-Madison for a year ($2K of total) and purchased a computer ($1.5K of total).  The dedicated computer was necessary to log directly into the Marshfield-based server on which all WGI data analysis is done; that server has only a de-identified version of the data, and access is only through a secure VPN connection that disables any other internet use from the initiating computer including access to our shared BMI file servers in Madison, so that our ordinary BMI desktops cannot be used.


Eric Lantz established drug and diagnosis hierarchies used in conjunction with the Marshfield data for a number of projects including the Atrial Fibrillation/Flutter (AFF), Breast Cancer, and Myocardial Infarction (MI) projects.  He refined methods for combining the data with these hierarchies to encode data in a format appropriate for machine learning algorithms.  In addition, he performed analysis for the AFF project to predict onset of AFF, death within 1 or 3 years of initial AFF, and stroke after AFF.  The AFF report already notes how this AFF work contributed to our obtaining and efficiently carrying out thus far an NLM funded project, “Secure Sharing of Clinical History and Genomic Data,” that involves both UW-Madison and Marshfield Clinic and has an annual direct amount of $480,448.  As noted, Eric’s work also contributed to the WGI Breast Cancer project, which led to NCI and NLM grants by Elizabeth Burnside, PI, totaling $676,062 annual direct, and to the MI project which contributed also toward obtaining and initiating an NIGMS project, collaborative between Marshfield Clinic (Caldwell, Peissig) and UW-Madison (PI Page), “Machine Learning for Identifying Adverse Drug Events,” at $466,315 annual direct.

Login below to start your research

Making Personalized Health Care a Reality

The Wisconsin Genomics Initiative is positioned to make the promise of personalized medicine a reality. Combining the cutting-edge technology and intellectual capital from multiple academic research centers, the Wisconsin Genomics Initiative not only aligns with the priorities of the National Institutes of Health, the US Department of Health & Human Services, but also with the Wisconsin Idea.

View the Demonstration Awards 

Content bottom