Lily Yuli Zheng

Data Scientist & Analytics Professional

πŸ“ž 1-613-890-9803
βœ‰οΈ [email protected]
🌐 Ottawa, ON

Technical Skills

Python SQL Snowflake Spark SQL Airflow Power BI Tableau AWS Machine Learning Data Visualization

Education

University of Ottawa
Master of Mathematics and Statistics (Co-op Option)
Ottawa, ON
Graduating Summer 2025
University of Waterloo
Bachelor of Computer Science and Statistics (Artificial Intelligence Option)
Waterloo, ON

Work Experience

Canada Post
Junior Data Analyst Intern - Pricing and Costing Team
Ottawa, ON
Jan 2024 – August 2024
  • Built and orchestrated an automated ETL pipeline with Apache Airflow, Spark SQL, Snowflake and Python to integrate competitor bidding-price data from multiple sources, cutting operational lead time by 20%
  • Delivered actionable real-time business insights into competitor pricing strategies by developing Power BI dashboards in collaboration with cross-functional teams, earning praise during team presentations
  • Designed advanced dynamic pricing models on Amazon S3 AWS using Ridge Regression and SVM (sklearn), increasing the likelihood of securing high-value contracts and maximizing revenue potential in real time
Hamilton Health Science
Data Analyst Intern - Corporate Planning and Analysis Team
Hamilton, ON
May 2023 – Aug 2023
  • Developed automated tests in JavaScript to validate website database outcomes by adding new modules to the codebase and updating testing results using JIRA tickets
  • Communicated with healthcare providers including hospitals and public health organizations, presenting monthly financial reports and automated dashboards using Excel
  • Acquired, analyzed, and visualized healthcare financial data using Tableau, identifying patterns and valuable insights from variance in monthly funding letters for stakeholders, decreasing 30% deficit for month end
Experian
Data Scientist Intern - Data Team
Shanghai, China
May 2022 – Aug 2022
  • Integrated pre-PBC rules targeting small-business clients and operationalized them into production-ready Python scripts, improving risk stratification that led to an estimated 12% decrease in average PCL per account
  • Automated and extended waterfall reports into independent rule-impact reports across all campaigns targeting 18M clients by isolating individual rule effects in Python
  • Redesigned campaign-execution workflow and migrated from SAS to Python & SQL, eliminating repetitive stacking and enabling scalable implementation following HSBC acquisition
WPP
Data Scientist Intern - Data Team
Shanghai, China
May 2019 – Aug 2019
  • Performed web scraping using Python (Selenium) to collect competitor customer product reviews, applied NLP sentiment analysis to identify actionable insights, leading to 10% increase in ROI
  • Developed KNN clustering to optimize targeting strategies of advertisement campaigns, improving click-through rate by 20%
  • Established automated visualization dashboards in Tableau to provide monthly P&L reports and insights, reducing operational lead time by 15%

Course Projects

Loan Approval Prediction with Machine Learning
  • Cleaned and engineered the 598-record Loan Approval Prediction dataset: imputed missing values, one-hot encoded 9 categorical fields, and derived Total Income & Income-to-Loan ratio in R
  • Benchmarked logistic regression, decision-tree, and 100-tree random-forest classifiers; tuned hyper-parameters to boost test-set accuracy to β‰ˆ 82% while improving denial-class recall by 14pp
  • Applied PCA plus K-Means/SOM clustering to uncover three borrower segments and confirmed credit-history as the dominant approval driver
Coding Style Learner with LSTM
  • Aimed to train a code autocompletion model that would learn user's coding style using LSTM neural networks
  • Scraped and pre-processed data from GitHub to one-hot encodings using Python Selenium, Pandas and NumPy
  • Implemented a character-wise LSTM RNN and built a GPU-accelerated training pipeline using PyTorch
Conversational Medical-Literature Recommender
  • Built a chat-based assistant that retrieves the 3 most relevant papers across five neuro-medical domains using SciBERT embeddings and cosine-similarity search
  • Orchestrated intent handling in Google Dialogflow and fulfilled requests via Python + Flask webhook, delivering recommend/compare/refine actions with β‰ˆ 0.85 Precision@3
  • Indexed 5,000 pre-embedded abstracts with FAISS and applied K-Means clustering to diversify results, cutting manual literature-screening time by 60%
Research in Cancer Gene Search with Genetic Algorithms

with Professor Shirley Mills

  • Applied GA-CFS feature-selection to prostate (12,600 genes) and lung (12,534 genes) microarray datasets, pruning β‰ˆ 52% of features while retaining signal
  • Built ensemble classifiers in R (Bagged DTs, SVM, weighted stacking), boosting test accuracy to 94%β€”an 8pp gain over published baseline
  • Led code for bagging & stacking modules, interpreted results, and produced slide deck summarizing biomarker insights and model performance