About Me
I am a passionate data scientist with a strong foundation in machine learning, NLP, and data analytics. With experience spanning academia, healthcare, and government projects, I thrive on solving complex problems with innovative solutions. I love learning, adapting to new challenges, and leveraging data to drive meaningful impact. Whether it's fine-tuning AI models, automating workflows, or visualizing insights, I bring curiosity, creativity, and a problem-solving mindset to everything I do.
Skills
What I bring to the table
Education

University of Michigan
Ann Arbor, MI, USA
Master's in Data Science
Achievements: GSI with tuition waiver (SI649), research assistantships, Ross Hackathon (2024 & 2025), TAMU Healthcare Hackathon
- Probability and Distribution
- Statistical Inference I & II
- Machine Learning, Regression, Time Series
- Causal Inference, Data Analytics & Viz

Savitribai Phule Pune University
Pune, MH, India
B.Tech in Information Technology
Achievements: Event Head at Codechef, TedxVIIT, Finance/Sponsorship, Sports Club Rep
- Machine Learning, AI, Discrete Maths
- OS, Networking, Data Structures
- C/C++, Java, HTML/CSS, JS, PHP
- Object-Oriented Programming
Experience
Data Scientist โ Michigan Medicine
Ann Arbor, MI
- Implementing the DECI model (Microsoft Causica) to generate and analyze causal graphs and compare produced adjacency matrices with those generated using the DoWhy library to evaluate model accuracy and consistency
- Assessing the reliability of synthetic data by analyzing and comparing adjacency matrices of original and synthetic datasets through quantitative techniques such as network centrality and graph similarity
- Developed a semantic search pipeline using Retrieval-Augmented Generation (RAG) on ASO guideline documents, leveraging LangChain and vector stores. Experimented with multiple text splitting strategies to optimize chunking for improved retrieval accuracy in domain-specific LLM tasks.
Data Analyst โ UM Precision Health
Ann Arbor, MI
- Increased survey participation rates by 50% by curating surveys and developing SQL queries and Python-based analytics to process and extract insights from response data, identifying key behavioral patterns and trends
- Developed real-time Power BI dashboards for survey monitoring
- Implemented reinforcement learning-based interactive games to assess patient attention span, collecting structured behavioral data for deeper analysis
Research Assistant โ School of Information, UMich
Ann Arbor, MI
- Achieved 98% word retrieval accuracy by developing a web scraping model for structured data extraction from American Historical textbooks
- Increased classification accuracy by 10% by expanding spaCy NER annotations with 7 new historical categories
- Cut manual validation time by 2 hours per dataset by creating an algorithm for data annotation verification
- Pre-processed and trained LLMs model on OntoNotes 5 and a custom data, fine-tuning GPT-NER and Meta-Llama-3-8B-Instruct using LoRA and 4-bit quantization, achieving 66% NER accuracy, improving historical NER
Graduate Student Instructor โ SI649
Ann Arbor, MI
- Taught data visualization using Tableau, Plotly, Altair, D3.js, and GenAI tools.
- Led interactive design labs with real-world datasets.
Data Science Intern โ Ministry of Defence
Pune, India
- Enabled real-time skeletal tracking for army drills by developing an AI-based pose estimation model using OpenCV, Mediapipe, and YOLOv8, tailored for military movement analysis
- Deployed a real-time feedback system generating automated pose correction reports and annotated video snippets, assisting trainees with immediate visual guidance. Deployed the algorithm using Flask
- Designed and Created UI/UX using Figma and user-friendly web application using Flask. Paper submitted to IEEE
Data Scientist โ Beyond Business Travel
TUS, Ireland (Remote)
- Built a data extraction pipeline for Enterprise and Hertz invoices using OCR, NLP, and machine learning, achieving 98% accuracy and reduced manual entry time
- Outperformed RPA tools such as BluePrism and UIpath with improved anomaly detection and 75% faster processing.
Projects

๐ง VAE Variants for Causal Effect Estimation
Nov 2024 โ Dec 2024
- Extended the CEVAE model to improve causal inference under unmeasured confounders using PyTorch and TensorFlow
- Implemented VAE variations, including Correlated-V AE, Beta-VAE, HVAE, and VQ-VAE, for enhanced latent variable modeling
- Conducted experiments on real-world (IHDP, JOBS, TWINS) and synthetic datasets using Pyro and Adamax optimizer achieving the best PEHE of 1.47 ยฑ 0.18 and ATE error of 1.26 ยฑ 0.75 on the JOBS dataset

โฝ Soccer Analytics Dashboard
Apr 2024
- Analyzed and processed over 3 million soccer records from 11+ datasets using SQL and Python, performing data cleaning, feature engineering, and statistical evaluation
- Developed an interactive Tableau dashboard to visualize key soccer analytics

๐ฆ TB Forecasting with ARIMA and POMP
Nov 2023 โ Dec 2023
- Improved TB incidence forecasting accuracy by comparing ARIMA and SEIRS-based POMP models
- Implemented a stochastic POMP model with overdispersion and time-varying transmission, achieving a more realistic fit to TB decline trends over time

๐ฆ Shark Tank Deal Prediction
Oct 2022 โ Dec 2022
- Identified key trends in startup features (industry, revenue, valuation) by performing exploratory data analysis using Python (Pandas, Matplotlib, Seaborn) and R
- Achieved an F1-score of 87.09% by building a machine learning pipeline with ANN, outperforming SVM and Random Forest in predicting startup funding offers
- Improved post-investment evaluation by implementing a fuzzy logic system in MATLAB with 22 rule sets to categorize deal quality
Publications
Offer and Deal-Quality Prediction using Machine learning and Fuzzy approach: A Shark Tank India Case Study
Proceedings of the ACM Web Conference 2023
Link 2023 Published
Empowering Indiaโs Climate Action: Harnessing Blockchain for Carbon Trading
2024 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS)
Link 2024 Published
Contact Me
(shreyadj@umich.edu)