About Me

I am a passionate data scientist with a strong foundation in machine learning, NLP, and data analytics. With experience spanning academia, healthcare, and government projects, I thrive on solving complex problems with innovative solutions. I love learning, adapting to new challenges, and leveraging data to drive meaningful impact. Whether it's fine-tuning AI models, automating workflows, or visualizing insights, I bring curiosity, creativity, and a problem-solving mindset to everything I do.

Skills

What I bring to the table

Python R SQL JavaScript HTML/CSS Matlab NumPy Pandas Seaborn Power BI Tableau Git OpenCV Keras Machine Learning Deep Learning Reinforcement Learning Causal Inference Computer Vision NLP Data Mining Statistics Scikit-learn TensorFlow PyTorch Teamwork Adaptability Time Management Problem Solving Critical Thinking Public Speaking

Education

University of Michigan

Ann Arbor, MI, USA

Master's in Data Science

Achievements: GSI with tuition waiver (SI649), research assistantships, Ross Hackathon (2024 & 2025), TAMU Healthcare Hackathon

  • Probability and Distribution
  • Statistical Inference I & II
  • Machine Learning, Regression, Time Series
  • Causal Inference, Data Analytics & Viz
Savitribai Phule Pune University

Pune, MH, India

B.Tech in Information Technology

Achievements: Event Head at Codechef, TedxVIIT, Finance/Sponsorship, Sports Club Rep

  • Machine Learning, AI, Discrete Maths
  • OS, Networking, Data Structures
  • C/C++, Java, HTML/CSS, JS, PHP
  • Object-Oriented Programming

Experience

Apr 2025 โ€“ Present
Data Scientist โ€“ Michigan Medicine

Ann Arbor, MI

  • Implementing the DECI model (Microsoft Causica) to generate and analyze causal graphs and compare produced adjacency matrices with those generated using the DoWhy library to evaluate model accuracy and consistency
  • Assessing the reliability of synthetic data by analyzing and comparing adjacency matrices of original and synthetic datasets through quantitative techniques such as network centrality and graph similarity
  • Developed a semantic search pipeline using Retrieval-Augmented Generation (RAG) on ASO guideline documents, leveraging LangChain and vector stores. Experimented with multiple text splitting strategies to optimize chunking for improved retrieval accuracy in domain-specific LLM tasks.
Jun 2024 โ€“ Dec 2024
Data Analyst โ€“ UM Precision Health

Ann Arbor, MI

  • Increased survey participation rates by 50% by curating surveys and developing SQL queries and Python-based analytics to process and extract insights from response data, identifying key behavioral patterns and trends
  • Developed real-time Power BI dashboards for survey monitoring
  • Implemented reinforcement learning-based interactive games to assess patient attention span, collecting structured behavioral data for deeper analysis
Jun 2024 โ€“ Nov 2024
Research Assistant โ€“ School of Information, UMich

Ann Arbor, MI

  • Achieved 98% word retrieval accuracy by developing a web scraping model for structured data extraction from American Historical textbooks
  • Increased classification accuracy by 10% by expanding spaCy NER annotations with 7 new historical categories
  • Cut manual validation time by 2 hours per dataset by creating an algorithm for data annotation verification
  • Pre-processed and trained LLMs model on OntoNotes 5 and a custom data, fine-tuning GPT-NER and Meta-Llama-3-8B-Instruct using LoRA and 4-bit quantization, achieving 66% NER accuracy, improving historical NER
Jun 2024 โ€“ Dec 2024
Graduate Student Instructor โ€“ SI649

Ann Arbor, MI

  • Taught data visualization using Tableau, Plotly, Altair, D3.js, and GenAI tools.
  • Led interactive design labs with real-world datasets.
Oct 2022 โ€“ May 2023
Data Science Intern โ€“ Ministry of Defence

Pune, India

  • Enabled real-time skeletal tracking for army drills by developing an AI-based pose estimation model using OpenCV, Mediapipe, and YOLOv8, tailored for military movement analysis
  • Deployed a real-time feedback system generating automated pose correction reports and annotated video snippets, assisting trainees with immediate visual guidance. Deployed the algorithm using Flask
  • Designed and Created UI/UX using Figma and user-friendly web application using Flask. Paper submitted to IEEE
Aug 2022 โ€“ Dec 2022
Data Scientist โ€“ Beyond Business Travel

TUS, Ireland (Remote)

  • Built a data extraction pipeline for Enterprise and Hertz invoices using OCR, NLP, and machine learning, achieving 98% accuracy and reduced manual entry time
  • Outperformed RPA tools such as BluePrism and UIpath with improved anomaly detection and 75% faster processing.

Projects

Project Image
๐Ÿง  VAE Variants for Causal Effect Estimation

Nov 2024 โ€“ Dec 2024

  • Extended the CEVAE model to improve causal inference under unmeasured confounders using PyTorch and TensorFlow
  • Implemented VAE variations, including Correlated-V AE, Beta-VAE, HVAE, and VQ-VAE, for enhanced latent variable modeling
  • Conducted experiments on real-world (IHDP, JOBS, TWINS) and synthetic datasets using Pyro and Adamax optimizer achieving the best PEHE of 1.47 ยฑ 0.18 and ATE error of 1.26 ยฑ 0.75 on the JOBS dataset
PyTorchTensorFlowCEVAEPyroAdamaxCausal Inference
Project Image
โšฝ Soccer Analytics Dashboard

Apr 2024

  • Analyzed and processed over 3 million soccer records from 11+ datasets using SQL and Python, performing data cleaning, feature engineering, and statistical evaluation
  • Developed an interactive Tableau dashboard to visualize key soccer analytics
SQLPythonData CleaningFeature EngineeringTableau
Project Image
๐Ÿฆ  TB Forecasting with ARIMA and POMP

Nov 2023 โ€“ Dec 2023

  • Improved TB incidence forecasting accuracy by comparing ARIMA and SEIRS-based POMP models
  • Implemented a stochastic POMP model with overdispersion and time-varying transmission, achieving a more realistic fit to TB decline trends over time
RTime SeriesARIMAPOMPSEIRSData Viz
Project Image
๐Ÿฆˆ Shark Tank Deal Prediction

Oct 2022 โ€“ Dec 2022

  • Identified key trends in startup features (industry, revenue, valuation) by performing exploratory data analysis using Python (Pandas, Matplotlib, Seaborn) and R
  • Achieved an F1-score of 87.09% by building a machine learning pipeline with ANN, outperforming SVM and Random Forest in predicting startup funding offers
  • Improved post-investment evaluation by implementing a fuzzy logic system in MATLAB with 22 rule sets to categorize deal quality
PythonRMachine LearningANNFuzzy LogicEDA

Publications

Offer and Deal-Quality Prediction using Machine learning and Fuzzy approach: A Shark Tank India Case Study

Shreya Jain Atharva Parikh

Proceedings of the ACM Web Conference 2023

Link 2023 Published

Empowering Indiaโ€™s Climate Action: Harnessing Blockchain for Carbon Trading

Shreya Jain Atharva Parikh, Riddhi Pawar, Shruti Jawale

2024 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS)

Link 2024 Published

Contact Me

(shreyadj@umich.edu)