2019 TEAMS

Lyda Hill Department of Bioinformatics 11/15/19 Lyda Hill Department of Bioinformatics 11/15/19

U-Hack Med 2019 Outcomes

Focused on computational solutions for clinical medicine and genomics, the U-Hack Med hackathon series is for developers, engineers and researchers at all career levels. To learn about the 2019 hackathon outcomes and winners, please see UTSW Center Times video & article.

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

TEAM 1: Hippocampal oscillatory patterns during associate recognition memory

Investigate hippocampal oscillatory patterns to test predictions of dual process theory to classify recognition versus recollection using memory retrieval data.

The goal of investigating hippocampal oscillatory patterns to test predictions of dual process theory motivated us to apply the Associate Recognition (AR) paradigm to intracranial EEG patients with depth electrodes inserted into both cerebral hemispheres, including both the anterior and posterior hippocampus simultaneously. Twenty subjects with medication-resistant epilepsy who underwent stereo-electroencephalography (sEEG) surgery with the goal of identifying their ictal onset region(s) performed AR paradigm during their monitoring period at UT Southwestern. Overall, there were 69 anterior and 41 posterior hippocampal electrodes included in the dataset. The subjects view series of word pairs during the encoding (study) phase. During the retrieval (test) phase, the word pairs were shown as intact (studied together), rearranged (both words were studies but on different trials), and new (neither word was studied) for two seconds, then received a cue to respond their decision whether the word pair was intact or rearranged or new. The sEEG signal was sampled at 1 kHz during acquisition and down sampled at 500 Hz offline for processing. Line noise was notch filtered and a kurtosis algorithm (with threshold of 4) was used to exclude abnormal events and interictal activity. The power and phase values (58 frequencies and 900 time steps) were extracted from 1800 ms time windows following the appearance of the study item using Morlet wavelets. We aim to:

1) Classify word pairs as either recollected or recognized using encoding signal and compare performance to the classification using retrieval signal.

2) Determine which features are most critical for classification.

3) Use classifier trained on encoding data to classify recognition versus recollection using retrieval data (measure of reinstatement of information).

Classifier can be trained across subjects/electrodes (including anterior and posterior or hemisphere) to increase the number of trials available for classification.

Team Leads: Bradley Lega, MD and Srinivas Kota, PhD, Neurosurgery, https://www.utsouthwestern.edu/labs/tcm/

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

team 2: A Deep convolutional neural network for identifying thyroid cancer

Develop a more accurate system for identifying thyroid cancer by applying a deep convolutional neural network algorithm to data from a UTSW thyroid cancer database.

Differentiating benign thyroid nodules from cancer is a significant health challenge, as thyroid nodules are present by ultrasound (US) in a staggering 19 – 35% of the general adult population (approximately 130 million people). Most of these nodules are benign, however the diagnostic process is costly, inefficient, and leads to unnecessary surgery and morbidity. We propose to develop a more accurate system for identifying thyroid cancer by applying a deep convolutional neural network (DCNN) algorithm to data from our thyroid cancer database, which includes a large number patients who have been followed clinically for several years. We propose to develop a new risk prediction and prognosis tool to predict the individualized risk of death and recurrence from thyroid cancer. We envision two modes of analysis; 1) exploratory analysis, and 2) clinical performance optimization. In the exploratory mode we will leverage advanced unsupervised machine learning techniques such as manifold learning and dimension reduction (DR) to visualize multi-modal, integrated, high-dimensional data (as a point cloud). We hope to build an interactive, GUI-based data visualization dashboard to allow rapid traversal of this complex data by scientists and clinicians alike. Additionally, we will incorporate and overlay the proposed predictive models as they are developed and refined, and provide the ability to annotate data/models on the fly for subsequent use with other models. In the second mode, performance optimization, we will build on the recognition patterns discovered in the augmented data exploration, and then refine and optimize specific candidate predictive models emphasizing those with high potential for clinical impact/utility. The most high-performing predictive models will then be validated retrospectively and prospectively. At the conclusion of this project, we will have a deep learning tool that has improved ability to detect thyroid cancer in patients with thyroid nodules. We also anticipate that we will have a risk prediction tool that accurately identifies the risk of recurrence and death after initial treatment of thyroid cancer. Through this process, we will have established one of the first comprehensive annotated databases of thyroid ultrasound images that will allow development and validation of a deep learning algorithm to accurately differentiate benign from malignant thyroid nodules.

Team Lead: Fiemu Nwariaku, MD, Surgery, https://www.utsouthwestern.edu/labs/nwariaku/

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 3: MONITORING POST-OPERATIVE HEAD-NECK CANCER PATIENTS VIA SELF-POWERED SMART INSOLES

Develop smart insoles to measure post-operative patient data and transmit via smart phone app to cloud storage for physician office for data analysis.

Head and Neck cancer (HNC) patients receiving radiotherapy are at high risk of weight loss because of difficulty in swallowing and change of taste. Acute weight loss can even result in emergency room visit and hospital admission, which has happened to about one third of HNC patients. Thus, it is important to maintain weight for better radiotherapy delivery and treatment outcome. Data collection and timely intervention via efficient doctor-patients communication are two important factors to achieve this goal. To automatically measure patients’ weight, we will develop smart insoles which include pressure sensors, power generators, and Bluetooth modules. These sensors will measure weight and transmit sensor data to smart phones, which convey the data further to cloud storage for analysis. It also converts kinetic energy to electric energy to supply power to the system. This will minimize the efforts from patients and thus reduce data loss. To improve communication with patients and improve self-monitoring by patients themselves, a smart phone application will be developed for patients to control the smart insoles and to enable staff-patient and robot-patient communications. Specific components of this system will be developed by the hackathon team.

Team Lead: Guanglin Tang, PhD, Medical Artificial Intelligence and Automation Lab, https://www.utsouthwestern.edu/labs/maia/

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 4: DEFINITION OF CELL CYCLE STATE IN LABEL FREE IMAGES

Test an approach to development of a label-free alternative to the FUCCI probe for detection of cell cycle state using a convolutional neuronal network that classifies label-free images as corresponding to a particular cell cycle phase.

Many cellular process are modulated by or in concert with the progression of the cell cycle from Mitosis (M) to Gap 1 (G1) to Synthesis (S) to Gap 2 (G2) phase. As a result, a snapshot of the molecular organization of such a process shows a heterogeneous picture, just because every cell is in a different phase of the cell cycle. To compensate for cell cycle related image variations the data should be aligned with the cell cycle phases. In 2008, Atsushi Miyawaki and his lab published a powerful probe, called FUCCI (Fluorescent Ubiquitination-based Cell Cycle Indicator), which indicates the progression of an individual cell through the cell cycle as a mixture of red and green fluorescence. However, probe occupies a large portion of the visible fluorescent spectrum, making it difficult to find other probes that do not spectrally overlap and thus could be used to concurrently monitor other cellular processes of interest. An alternative to FUCCI would be a label-free probe that extracts the identities of identifies the cell cycle state from the cytoplasmic and nuclear texture detectable in a non-fluorescence microscopy modality. The goal of this hackathon project is to test the possibility of establishing such a label-free probe. One suggested approach will be to train a convolutional neuronal network that either directly or indirectly, through definition of a latent feature space, classifies label-free images as corresponding to a particular cell cycle phase. Training data sets will be available that present the potential correspondence between label-free image and cell cycle phase with the FUCCI probe. Should this work, there will be innumerable immediate applications in a very large community of microscopy users in the biomedical sciences.

Team Leads: Murat Can Çobanoğlu, PhD and Tadamoto Isogai, PhD, Lyda Hill Department of Bioinformatics, www.utsouthwestern.edu/labs/danuser/

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 5: Stratification of SLE patient cohort for precision medicine

Create analytical tools to stratify lupus patients into subsets based on significant differences in genetics and clinical characteristics.

Systemic Lupus Erythematosus (SLE) is a complex polygenic autoimmune disease. Existing literature suggest that combination of genetic risk alleles and environmental factors can create an autoimmune-prone immune system which becomes dysregulated, leading to autoimmunity. SLE is clinically and genetically heterogeneous, which complicates its diagnosis, prognosis, management, and the development of effective therapeutic protocols. We have developed an extensive dataset combining genomic sequences of more than 28 SLE risk loci with extensive autoantibody profiles and clinical pathologies.

Our goal is to develop analytical tools that can use these data to stratify SLE patients into subsets with significant differences in their genetics and clinical characteristics. The development of a genomic/phenotypic test that could predict clinical course and response to management would significantly improve outcomes and prevent complications of SLE. We believe that it is likely that this “precision medicine” approach will lead to the development of companion diagnostics that can be used to identify SLE patients that are most likely to respond positively to specific drug therapies. Since SLE clinical trials often show drug efficacy in a subset of patients but fail to be of significant benefit for the total patient cohort, we believe that effective subsetting of SLE patients based on genetic and phenotypic elements could dramatically improve the discovery of effective drug therapies for targeted patients subsets.

Team Lead: Prithvi Raj, Immunology, profiles.utsouthwestern.edu/profile/116973/prithvi-raj.html

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 6: The isoform potential of the genome

Maximize utility of gene ontology analysis thru creation of a genome browser track making protein isoform prediction accessible to the broader biomedical research community.

While the diversity of cellular functions is attributed, in part, to the protein isoform variation mediated by pre-mRNA splicing, a comprehensive catalog of candidate functional isoforms is lacking. An atlas of predicted alternative splicing events with functional potential – intact, conserved open-reading frames (ORF) - would provide a platform for protein isoform discovery, detection and characterization in physiology and disease states.

Protein isoforms produced from alternative splicing events of specific genes are of major interest to the Hancks Lab for their potential to expand and diversify the host arsenal against invading pathogens and regulate immune responses. Spliceform discovery has been driven by cDNA analysis. However, this framework is limited in detection power when spliceforms are: 1) tissue- or cell-type (unsampled), 2) inducible, 3) developmentally-regulated, or 4) low abundance variants. We propose to develop a bioinformatic framework that could serve to identify spliceform candidates genome-wide and seed biological investigation.

To define the landscape of potential ORFs from, nonsense-mediated decay (NMD) pathway rules should be used as an initial filter. To further limit false positives, sequence across species conservation can be exploited. To aid in the characterization of predicted isoforms, gene ontology analysis should be performed. The utility of this information would be maximized if translated into a genome browser track allowing accessibility to the community at large and those interested in the biology of specific-genes/pathways. Described filtering would require incorporating either scripts and/or existing databases for translation of DNA sequences, NMD predicted-susceptibility, and gene ontology analysis.

Team Lead: Dustin Hancks, Immunology, www.utsouthwestern.edu/labs/hancks/

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 7: Using Machine Learning to Improve Post-Acute Rehabilitation Process Flow

Develop a prediction model to streamline the transition of care process for hospital patients transitioning into rehabilitation facilities, with the goal of decreasing acute length of stay, facilitating outside referrals when needed and decreasing attrition from our health system.

The Centers for Medicare and Medicaid Services (CMS) require patients to have acute medical needs and intensive therapy requirements to receive care at an inpatient rehabilitation facility (IRF). Predictive models that target patients with a higher probability of IRF admission can lead to decreased length of acute hospitalization, improved bed flow and lower acute care costs. Under the approval of the United States Congress, CMS implemented the per-discharge Prospective Payment System for IRF care based on functional improvements and a patient meeting the “60% Rule” of qualifying diagnoses by primary ICD10 code.

At UTSouthwestern, admission coordinators screen potential IRF candidates for reasonable and necessary criteria, then calculate the percentage of patients who have qualifying IRF diagnoses. If the percentage does not top 60%, patients with non-qualifying diagnoses would not be admitted, even if a bed is available. This process is not only cumbersome, it is time consuming. UT Southwestern would benefit from a prediction model, which takes into account the reasonable and necessary criteria for IRF admission on the acute care service, the existing qualifying diagnoses on the IRF service and IRF bed availability for non-qualifying diagnoses.

This prediction model would streamline the transition of care process, with the goal of decreasing acute length of stay, facilitating outside referrals when needed and decreasing attrition from our health system. Should this predictive model prove successful, we could test intervention fidelity using the 2019 data. There are over 1100 IRFs in the United States, and there is no existing data on the use of machine learning to improve process flow for post-acute care. Therefore, there is a high likelihood of clinical utility for this intervention.

Team Lead: Nneka Ifejika, MD, MPH, Physical Medicine and Rehabilitation, utswmed.org/doctors/nneka-ifejika/

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 8: Automated algorithm for detecting and localizing enteric tubes to facilitate accurate and timely radiographic interpretation

Create an image analysis tool to accurately localize enteric tubes and flag potential problems on radiographs in clinical practice.

Misplaced enteric tubes can result in significant morbidity, including pneumothorax, pleural effusion, retropharyngeal abscess and lung abscess. In clinical practice, radiographs are used to confirm position of enteric tubes. These radiographs may be reviewed by the primary team or interpreted by the radiologist prior to starting enteral feeding or medications. Although rare, missed malpositioned enteric tubes can lead to catastrophic consequences including patient death. Mistakes in interpretation can occur due to poor image quality, distracting findings on the radiographs (such as presence of other tubes or pathologies) , or consequence of other human error. An accurate algorithm automatically localizing enteric tubes can help reduce such errors. The algorithm could also flag potential positive studies on reading worklists to facilitate prioritization of interpretation by a radiologist, regardless of length of the reading queue.

We will utilize approximately 1000 abdominal radiographs without enteric tubes and about 1000 radiographs with tubes in place, and bounding box information for localization of the tips of the catheter. Our goal is to create a solution that can reliably identify whether a tube is present and pretty close at deriving bounding boxes that enclose the terminal tip of the tubes. In the final product, the locations of the terminal tips can be used to identify images containing malpositioned tubes, potentially through statistical means or cluster analysis.

Team Leads: Travis Browning, MD and Ye Seng Ng, MD, Emergency Medicine, profiles.utsouthwestern.edu/profile/42192/travis-browning.html

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 9: Real-Time Cardiac Assessment of Catheterization-Derived Fick and CMR-Derived Flow

Improve workflow, reliability and reproduciblity of interventional cardiac MRI through a web/app-based user-friendly display for real-time assessment.

Cardiologists are tasked with the role of determining complex hemodynamic information that is important to determine the need for catheter-based and/or surgical intervention. Today’s standard of care practice involves performing cardiac catheterization and cardiac magnetic resonance (CMR) separately. Catheterization by x-ray is used primarily to collect pressure and saturation data and to intervene on hemodynamically significant holes in the heart. In addition, interventionalists are able to place stents, coils, and percutaneous valves. Meanwhile, CMR is a powerful emerging tool to help cardiologists and CT surgeons answer complex physiology questions by showing very accurate function and flow data.

Interventional cardiac magnetic resonance (iCMR) is a new approach in congenital cardiology gaining traction around the world. In the US, the National Institutes of Health (NIH) and Children’s Health Dallas are the only two centers currently actively pursuing this research in the congenital heart population. iCMR is like a normal heart catheterization, but the procedure occurs in the MRI magnet instead of the catheterization X-ray lab. It is our hope that a radiation-free heart catheterization will become standard of care for patients in the future.

We would like to improve the workflow and ease of calculating important hemodynamic information derived by cardiac catheterization, CMR, and/or iCMR. These numbers are important because they will help guide therapy and are often what surgeons and cardiologists analyze to determine if intervention is necessary. Cardiac catheterization and CMR use different techniques to ultimately produce the same hemodynamic information. Both techniques use simple but unrelated equations to determine the patient’s systemic and pulmonary cardiac flows. The current workflow of using these equations is completely manual and subject to human error.

Our goal is to improve workflow, reliability, and reproducibility by improving the accessibility of these hemodynamic equations through a web/app-based user-friendly display that will allow the operator to obtain information in real time.

Team Lead: Yousef Arar, MD, MPH, Pediatric Cardiology, www.utsouthwestern.edu/education/medical-school/departments/pediatrics/divisions/cardiology/

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 10: novel clinical prediction approaches to managing care for acute Pulmonary Embolism patients

Derive an optimally accurate prediction rule for 48-hour severe adverse events in acute pulmonary embolism patients.

Acute pulmonary embolism (PE) - blood clots in the lung arterial system - is common and can result in high morbidity and mortality. Initial risk stratification is critical to identify patients with a poor prognosis who might benefit from advanced therapy or ICU care. Treatment is clear for patients with shock, but less clear for all other PE patients. One approach is the Bova score but it has had minimal replication and use. Other scores such as the PESI were derived to predict 30d outcomes. What is missing and what is a pain point for emergency, ICU, and hospitalist physicians at the time of diagnosis is the question of short term – 24-48 hour severe adverse outcomes. No existing prediction rule addresses this. The aim of this work is to derive an optimally accurate prediction rule for 48-hour severe adverse events in acute PE patients. Subsequent work could validate this prediction rule in a separate retrospective dataset. Final stage work would prospectively validate this in a management strategy aimed at delivering the right level of care to the right severity of patient, to maximize efficiency and quality of care.

Team Leads: D. Mark Courtney, MD, MSCI and Samuel McDonald, MD, Emergency Medicine, www.utsouthwestern.edu/education/medical-school/departments/emergency/

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 11: Optical coherence tomography-angiography in glaucoma

Develop a deep learning tool that will parse OCTA data facilitating diagnosis of early glaucoma and its progression and that will rule out non-glaucoma conditions.

Glaucoma is the second most common cause of blindness worldwide. It is believed that glaucomatous optic neuropathy is the result of combination of mechanical stress and reduced blood supply to the optic nerve. Up till now, we have based glaucoma diagnosis and progression on the level of intra-ocular pressure (IOP), optic nerve cupping and visual field (VF) testing but these modalities have inherent weaknesses. Optical coherence tomography angiography (OCTA) is a robust, non-contact noninvasive imaging system that provides reproducible, 3-dimensional, high resolution, volumetric and quantitative data of the retinal layers including vascular networks in retina, optic nerve and choroid. Currently, there is no effective way of analyzing data from ever expanding field of OCTA including understanding the role played by age, gender, race, refractive errors, hypertension and diabetes. We propose to develop a deep learning tool that will parse OCTA data facilitating diagnosis of early glaucoma and its progression and that will rule out non-glaucoma conditions.

Team Lead: Karanjit S Kooner MD, Opthalmology, profiles.utsouthwestern.edu/profile/13996/karanjit-kooner.html

Lyda Hill Department of Bioinformatics 5/30/19 Lyda Hill Department of Bioinformatics 5/30/19

Team 12: Rapid evaluation of statistical strategies for predicting interactions and information transfer between proteins

Rapidly organize, statistically describe, and filter data and evaluate the performance of alternative models for phylogenetic correction on a gold-standard test set of protein interaction data.

Protein-protein binding interactions, inter-protein allosteric regulation, and protein complex formation provide a foundation for communication in the cell. An ability to predict and understanding protein interactions at residue-level resolution is necessary to design precise mutagenesis experiments, engineer protein complex specificity, and introduce new regulation. A substantial body of work now suggests that protein sequence co-evolution has major potential for predicting protein interactions.

We hypothesize the present approaches are limited by substantial levels of phylogenetic noise and an emphasis on local residue contacts rather than distributed residue networks. We seek to extend our analysis to a larger test set of complexes and more fully explore alternative models for phylogenetic correction. The hackathon would provide an excellent opportunity to rapidly organize, statistically describe, and filter our initial data set of protein interactions and to evaluate the performance of this method (as well as variations around it) on a gold-standard test set of protein interaction data. Following the hackathon, we plan to extend the most successful version of the analysis genome-wide (across E. coli) to globally infer the protein interactome.

Team Lead: Kimberly Reynolds, PhD, Green Center for Systems Biology, reynolds-lab.net/