Shirish Khanal

Shirish Khanal

Senior Data Engineer, Researcher, Incoming PhD Student

I am a Senior Data Engineer and researcher with six years of experience in scalable data architecture and machine learning. I am transitioning into a PhD program in Computer and Statistical Sciences at the University of Rhode Island. My research interests focus on trustworthy and explainable AI, causal reasoning, algorithmic fairness, and biomedical applications.

Research Interests

Trustworthy & Explainable AI

Developing computational methods that prioritize rigorous methodology over surface performance to ensure AI is transparent and reliable for high-stakes decision-making.

Causal Representation Learning

Building models that capture true cause-and-effect relationships rather than mere correlations to distinguish genuine predictive signals from retrospective labels.

Algorithmic Fairness

Investigating how to build equitable AI systems by exploring the intersection of machine learning and social science to prevent algorithms from perpetuating systemic flaws.

Biomedical ML Application

Applying machine learning to clinical decision support and disease phenotyping to transform noisy, unstructured clinical data into robust, actionable evidence.

Scalable ML Systems

Optimizing large-scale distributed pipelines and high-performance infrastructure to efficiently support complex modeling and real-world AI research at scale.

Natural Language Processing

Exploring NLP tasks within the biomedical domain to make language models more robust, interpretable, and effective at handling sparse, high-dimensional clinical text.

Research Experience

Computational Healthcare Analytics & Predictive Modeling

Master's Capstone, East Tennessee State University, 2022-2023

I developed an end-to-end predictive analytics pipeline using HCUP inpatient datasets to forecast hospital readmissions and mortality. Over the course of the project, I trained and evaluated sixteen different machine learning and neural network models using metrics like AUC, ROC, precision, and recall. A major insight came while conducting an error analysis on the mortality model. I discovered target leakage was artificially boosting its accuracy by using billing codes generated after a patient's death. This experience deeply shaped my research philosophy, highlighting the fundamental limitation of models that optimize for accuracy without the causal reasoning needed to separate genuine predictive signals from retrospective labels. (You can view the code for this project on my GitHub repository).

Disease Phenotyping from Biomedical Text

ETSU, 2022

In this project, I built a disease diagnosis classification system by mining biomedical literature. I extracted over 40,000 symptom-related clinical abstracts directly from PubMed. After implementing thorough NLP preprocessing, I trained and compared six different classifiers, ranging from traditional algorithms to deep learning models like CNNs and Bidirectional RNNs. Interestingly, I found that a Linear Support Vector Machine (SVM) outperformed the complex neural networks, achieving over 93% accuracy. This work proved to me that on sparse, high-dimensional feature spaces typical of medical text, simpler and more interpretable algorithms can often be far more effective and reliable than black-box deep learning approaches. (You can view the code for this project on my GitHub repository).

Sports Analytics with Sentiment-Enhanced ML

ETSU, 2022

I led a project to predict player performance in the Fantasy Premier League by combining traditional statistics with public sentiment. Using the Twitter API, I collected a dataset of over 91,000 social media posts discussing specific players and teams. I explored different natural language processing techniques and found out that transformer-based roBERTa model was much better at understanding Twitter-specific jargon but it was significantly more computationally intensive. By integrating these sentiment-derived signals with standard time-series performance data, I was able to improve the overall prediction accuracy of models like LSTMs. This project was a great exercise in multi-modal feature integration, showing how qualitative public sentiment can be quantified to enhance predictive algorithms.

GenAI-Powered RAG Systems

Deutsche Bank, 2024-2025

I designed GenAI solutions using Google Gemini, implementing RAG pipelines with embeddings and vector search. I enabled LLM-driven text classification and contextual inference for production data pipelines.

Professional Experience

Senior Data Engineer, Fiserv

Alpharetta, GA, Jan 2026 - Present

I am leading the migration of legacy Informatica ETL pipelines to AWS Glue with Apache Airflow. I am re-architecting mappings into Spark-native transformations, orchestrating batch workflows, and loading curated datasets into Snowflake.

Senior Data Engineer, Deutsche Bank

Cary, NC, Oct 2023 - Dec 2025

I owned the design of cloud-native data pipelines on GCP using PySpark and Dataproc. I led the migration of on-prem Cloudera workloads, automated infrastructure with Terraform, built CI/CD pipelines, and deployed GenAI solutions using Google Gemini with RAG.

Software Engineer, Deerwalk Services (Cedar Gate Technologies)

Kathmandu, Nepal, Nov 2017 - Aug 2021

I built batch ETL pipelines for HIPAA-regulated healthcare data using Java, Python, Spark, and AWS. I performed large-scale data wrangling and contributed to risk prediction model development.

Teaching & Service

Graduate Teaching Assistant, Computer Science

East Tennessee State University, 2022-2023

I taught CS courses for 150+ students, led lab sessions, conducted code reviews, and mentored incoming teaching assistants.

Graduate Assistant, Access ETSU

East Tennessee State University, 2021-2022

I served as a Graduate Assistant, promoting access, inclusion, and academic support for students with diverse learning needs.

Education

PhD in Computer Science

University of Rhode Island

Computer and Statistical Sciences, Fall 2026 (Incoming)

M.S. in Computer Science

East Tennessee State University

Machine Learning & AI, Aug 2021 - May 2023
GPA: 4.0 / 4.0

B.S. in Computer Science

Kathmandu University

Aug 2013 - Oct 2017
I graduated in the top 5% of my class

Technical Skills

Languages

Python PySpark Scala SQL Java R TypeScript

Big Data & Processing

Apache Spark Hadoop Hive AWS Glue GCP Dataproc Apache Iceberg

Cloud Platforms

GCP AWS BigQuery Snowflake S3 EMR Redshift

ML / NLP

Scikit-learn TensorFlow Keras VADER Embeddings RAG

Orchestration & DevOps

Apache Airflow Terraform Docker Kubernetes Jenkins GitHub Actions

Databases & Visualization

PostgreSQL MySQL Tableau Power BI

Beyond Research

Trekking

I love hitting the trails and have completed several high-altitude treks across Nepal. My journey so far includes crossing the Annapurna Circuit (5,416m) and reaching spots like Mardi Himal Base Camp (4,500m), Gosainkunda (4,380m), Annapurna Base Camp (4,130m), Langtang Valley (3,870m), and Shey Phoksundo Lake (3,611m).

Sports

I'm a massive football (soccer) fan and a huge Arsenal FC supporter. I'm also a big Formula 1 enthusiast, Lewis Hamilton is my favorite driver.

📷

Photography

I'm starting to get into visual storytelling and photography. I'm mostly focusing on developing a technical eye for composition and light. @shirishhhh

🏃

Running

I'm currently trying my hand at distance running. I'm very much a beginner, just focusing on building up my endurance and trying to stay consistent.

🏢

Political & Social Research

I occasionally read up on different political systems and social structures, mostly out of personal curiosity about how things are organized behind the scenes.

Vedic Astrology

I spend some time studying classical Jyotish. I enjoy applying an analytical and systematic approach to it, exploring personality patterns and life cycles through this ancient framework.

```