Research projects
A Machine Learning Approach to Prioritizing Secondary Mis-Match Repair Pathway Genes Missed by Current Lynch Syndrome Multi-Gene Sequencing Panels
Lynch syndrome (LS) occupies a paradoxical position in clinical genetics: it is the most comprehensively characterized hereditary cancer syndrome, yet one in three clinically suspected cases leaves standard genetic testing without a molecular diagnosis. This is not a failure of sequencing technology it is a failure of what we choose to sequence. Germline pathogenic variants in MLH1, MSH2, MSH6, PMS2, or EPCAM deletions impair DNA mismatch repair (MMR), generating a mutator phenotype that drives colorectal and endometrial malignancy before age 50 in approximately 1 in 279 individuals. The clinical stakes are exceptional: colonoscopic surveillance reduces colorectal cancer mortality by over 60%, and pembrolizumab achieves response rates exceeding 40% in MMR-deficient metastatic disease. Diagnosis is not merely prognostic it is therapeutically decisive. The gap persists for compounding, addressable reasons. The Amsterdam II and Bethesda criteria still widely used as testing triggers are absent in up to 63% of confirmed carriers, systematically excluding patients who lack classic familial clustering. Multigene panel testing has expanded reach, but a structural problem remains: genes harboring well-documented MMR-pathway pathogenic variants are absent from major commercial panels, leaving high-risk individuals undiagnosed after negative standard testing. Genotype-specific blind spots compound this pseudogene interference confounding PMS2 sequencing, attenuated microsatellite instability masking MSH6 carriers, and deep intronic variants invisible to exon-capture generating false-negatives even when the correct gene is tested. The cumulative consequence is measurable and avoidable: missed cascade testing, delayed surveillance, and forfeited immunotherapy eligibility across thousands of families annually. This project addresses that gap at its structural root. By mining ClinVar, COSMIC, gnomAD, and TCGA for genes harboring ≥3 Lynch-specific pathogenic variants absent from >70% of commercial panels, we quantify the evidence-to-adoption gap systematically for the first time. A multi-dimensional prioritization framework integrating clinical, functional, population, and MMR network biology evidence validated through interpretable regularized logistic regression with SHAP-attributed feature importance identifies which overlooked genes carry the strongest case for clinical inclusion. Multi-cohort validation across TCGA, NCBI GEO/ArrayExpress, and the All of Us Research Program delivers the ancestry-diverse replication, including dedicated Hispanic/Latino subgroup analyses, that prior LS research has largely omitted.
Systematic Computational Discovery of Novel Hereditary Breast and Ovarian Cancer Syndrome Candidate Genes: A Multi-Cohort Bioinformatics Framework with Hispanic/Latino Representation
Hereditary Breast and Ovarian Cancer (HBOC) syndrome is among the most clinically actionable inherited cancer predispositions, yet a substantial fraction of familial breast and ovarian cancer clustering remains genetically unexplained even after comprehensive standard panel testing of established genes including BRCA1, BRCA2, PALB2, ATM, CHEK2, RAD51C, RAD51D, BARD1, BRIP1, and their homologous recombination repair (HDR) pathway partners. This diagnostic gap carries measurable clinical consequences: missed eligibility for PARP inhibitor therapy in HDR-deficient metastatic disease, foregone risk-reducing surgical interventions, and failed cascade testing in at-risk relatives who remain unaware of their inherited risk profile. We propose that secondary HDR susceptibility genes those harboring well-documented HBOC-specific pathogenic variants yet systematically excluded from commercial panels account for a meaningful, quantifiable portion of this unresolved hereditary fraction. This pilot study interrogates that premise through a five-layer computational strategy: exhaustive variant-level curation from ClinVar, COSMIC, LOVD, TCGA, and gnomAD; a multi-dimensional evidence-scoring framework integrating clinical annotation, population allele frequencies, functional genomic signals, and degree-normalized BRCA1/2 interactome centrality; dual regularized machine learning models (logistic regression and random forest) trained on historical panel adoption decisions, reporting precision, recall, ROC-AUC, and SHAP-attributed feature importance; BRCA1/2-centered protein interactome network analysis with triple-filter candidate advancement; and multi-cohort validation across TCGA, NCBI GEO/ArrayExpress, and the NIH All of Us Research Program. Deliverables 20–25 ranked candidate genes, a reproducible Docker-containerized scoring pipeline, and an interpretable ML framework constitute rigorous preliminary data for a subsequent NIH R15 Academic Research Enhancement Award application.
Our Team
Ayyappa Kumar Sista Kameshwar
Principle Investigator
Dr. Ayyappa Kumar Sista Kameshwar is an Assistant Professor of Bioinformatics in the Department of Biology at Utah Tech University. He holds a bachelor’s and master’s degree in biotechnology from Osmania and VIT University, respectively, and a Ph.D. in biotechnology from Lakehead University. He completed postdoctoral research at Texas A&M University, the University of Calgary, and the University of Guelph, and has industry experience at Strand Life Sciences and SCIEX (Danaher Corporation). His research focuses on biotechnology, genomics, and bioinformatics, emphasizing data-driven biological analysis and translational research.
Tyler Mullins
Undergraduate Researcher
Bio: Hi, I am Tyler Mullins, an undergraduate biotechnology student at Utah Tech University with a background in bioinformatics. My previous research experience includes developing a Python script to analyze nanoparticle images from a scanning electron microscope. In 2026, I joined Dr. Kameshwar’s research group, driven by my passion for precision medicine and its global healthcare applications. Outside of academics, I enjoy rock climbing, running, biking, camping, and competing in cross country and track.
Research Project: A Machine Learning Approach to Prioritizing Secondary MMR Pathway Genes Missed by Current Lynch Syndrome Multi-Gene Sequencing Panels
Joshua Borkman
Undergraduate researcher
Bio: My name is Joshua Borkman. I am from Utah County and am completing my bachelor’s degree at Utah Tech University. My passion for science began as a desire to understand the natural world, which evolved into a focus on medical biology and helping those affected by disease. I am now pursuing bioinformatics to bridge biological data and clinical applications, contributing to more effective treatments.
Research Project: A Machine Learning Approach to Prioritizing Secondary MMR Pathway Genes Missed by Current Lynch Syndrome Multi-Gene Sequencing Panels
Kiara Rojas Videa
Undergraduate researcher
Bio: My name is Kiara, am a bioinformatics student with a strong interest in applying computational tools to understand biological data and solve real-world problems. My academic focus includes genetics, data analysis, and programming, which I use to explore questions in health, disease, and environmental science. I am passionate about developing skills that will allow me to contribute to innovative research and advancements in biotechnology.
Research Project: A Machine Learning Approach to Prioritizing Secondary MMR Pathway Genes Missed by Current Lynch Syndrome Multi-Gene Sequencing Panels
Dav Dorobantu
Undergraduate researcher
Bio: My name is David Dorobantu, and I am a biotechnology student on a pre-medical track. I love biology and love to learn more about it, and am eager to do research in biotechnology.
Research Project: A Machine Learning Approach to Prioritizing Secondary MMR Pathway Genes Missed by Current Lynch Syndrome Multi-Gene Sequencing Panels
Courses
BIOL 2300- Fundamentals of Bioinformatics
BIOL 3300- Introduction to Bioinformatics
BTEC 4810 - Independent Research Project
BIOL 4310 - Advanced Bioinformatics
BIOL 5320 - Scripting for Biologists
Office Hours
Sunday
- Closed.
Monday
- 9:30 am - 12:30 pm
Tuesday
- 9:30 am - 12:30 pm
Wednesday
- 9:30 am - 12:30 pm
Thursday
- 9:00 am - 12:30 pm
Friday
- 9:30 am - 12:30 pm
Saturday
- Closed.
Contact Us
The Kameshwar Lab is always seeking new opportunities for collaboration with partners across academic institutions, clinical settings, and the industry sector. We are open to sharing bioinformatic pipelines, curated variant datasets, and gene panel resources with fellow researchers to support and advance the broader genomics community. If you are interested in working together, exploring a research partnership, or would like to request access to our resources, please do not hesitate to reach out to us via phone or email.
Ayyappa Sista Kameshwar, PhD
Assistant Professor of Bioinformatics
Email: ayyappa.sista.kameshwar@utahtech.edu
Phone: 435-879-4332
Office: SET 228