Technical Skills
Python
SQL
Snowflake
Spark SQL
Airflow
Power BI
Tableau
AWS
Machine Learning
Data Visualization
Education
University of Ottawa
Master of Mathematics and Statistics (Co-op Option)
Ottawa, ON
Graduating Summer 2025
University of Waterloo
Bachelor of Computer Science and Statistics (Artificial Intelligence Option)
Waterloo, ON
Work Experience
Canada Post
Junior Data Analyst Intern - Pricing and Costing Team
Ottawa, ON
Jan 2024 β August 2024
- Built and orchestrated an automated ETL pipeline with Apache Airflow, Spark SQL, Snowflake and Python to integrate competitor bidding-price data from multiple sources, cutting operational lead time by 20%
- Delivered actionable real-time business insights into competitor pricing strategies by developing Power BI dashboards in collaboration with cross-functional teams, earning praise during team presentations
- Designed advanced dynamic pricing models on Amazon S3 AWS using Ridge Regression and SVM (sklearn), increasing the likelihood of securing high-value contracts and maximizing revenue potential in real time
Hamilton Health Science
Data Analyst Intern - Corporate Planning and Analysis Team
Hamilton, ON
May 2023 β Aug 2023
- Developed automated tests in JavaScript to validate website database outcomes by adding new modules to the codebase and updating testing results using JIRA tickets
- Communicated with healthcare providers including hospitals and public health organizations, presenting monthly financial reports and automated dashboards using Excel
- Acquired, analyzed, and visualized healthcare financial data using Tableau, identifying patterns and valuable insights from variance in monthly funding letters for stakeholders, decreasing 30% deficit for month end
Experian
Data Scientist Intern - Data Team
Shanghai, China
May 2022 β Aug 2022
- Integrated pre-PBC rules targeting small-business clients and operationalized them into production-ready Python scripts, improving risk stratification that led to an estimated 12% decrease in average PCL per account
- Automated and extended waterfall reports into independent rule-impact reports across all campaigns targeting 18M clients by isolating individual rule effects in Python
- Redesigned campaign-execution workflow and migrated from SAS to Python & SQL, eliminating repetitive stacking and enabling scalable implementation following HSBC acquisition
WPP
Data Scientist Intern - Data Team
Shanghai, China
May 2019 β Aug 2019
- Performed web scraping using Python (Selenium) to collect competitor customer product reviews, applied NLP sentiment analysis to identify actionable insights, leading to 10% increase in ROI
- Developed KNN clustering to optimize targeting strategies of advertisement campaigns, improving click-through rate by 20%
- Established automated visualization dashboards in Tableau to provide monthly P&L reports and insights, reducing operational lead time by 15%
Course Projects
Loan Approval Prediction with Machine Learning
- Cleaned and engineered the 598-record Loan Approval Prediction dataset: imputed missing values, one-hot encoded 9 categorical fields, and derived Total Income & Income-to-Loan ratio in R
- Benchmarked logistic regression, decision-tree, and 100-tree random-forest classifiers; tuned hyper-parameters to boost test-set accuracy to β 82% while improving denial-class recall by 14pp
- Applied PCA plus K-Means/SOM clustering to uncover three borrower segments and confirmed credit-history as the dominant approval driver
Coding Style Learner with LSTM
- Aimed to train a code autocompletion model that would learn user's coding style using LSTM neural networks
- Scraped and pre-processed data from GitHub to one-hot encodings using Python Selenium, Pandas and NumPy
- Implemented a character-wise LSTM RNN and built a GPU-accelerated training pipeline using PyTorch
Conversational Medical-Literature Recommender
- Built a chat-based assistant that retrieves the 3 most relevant papers across five neuro-medical domains using SciBERT embeddings and cosine-similarity search
- Orchestrated intent handling in Google Dialogflow and fulfilled requests via Python + Flask webhook, delivering recommend/compare/refine actions with β 0.85 Precision@3
- Indexed 5,000 pre-embedded abstracts with FAISS and applied K-Means clustering to diversify results, cutting manual literature-screening time by 60%
Research in Cancer Gene Search with Genetic Algorithms
with Professor Shirley Mills
- Applied GA-CFS feature-selection to prostate (12,600 genes) and lung (12,534 genes) microarray datasets, pruning β 52% of features while retaining signal
- Built ensemble classifiers in R (Bagged DTs, SVM, weighted stacking), boosting test accuracy to 94%βan 8pp gain over published baseline
- Led code for bagging & stacking modules, interpreted results, and produced slide deck summarizing biomarker insights and model performance