The Bike Sales Dashboard is a comprehensive Business Intelligence project designed to showcase North American bike sales data from 2016 to 2024. This dashboard centralizes data from multiple sources through a data warehouse, enabling efficient analysis and reporting. It visualizes key sales metrics and performance trends, supporting business users in making informed decisions.
The Car Sales Dashboard is a Business Intelligence project aimed at providing detailed insights into car sales data. This project centralizes and visualizes sales data for interactive analysis and reporting, supporting decision-making by offering a comprehensive view of sales KPIs and performance metrics.
This project includes a RESTful API that allows users to submit text data and receive topic modeling results. Users can input text data, and the API returns the identified topics along with their associated keywords. This system processes extensive text data to uncover latent themes, providing valuable insights in fields such as news classification, social media analysis, and academic research.
Guangxi Normal University, China
Bachelor of Science in Computer Science
GPA:3.53/4.0
Course: Computer Network, Operation System, Database, Object-Oriented Programming
Sep 2016 - Jul 2020
University of Nottingham, UK
Master of Science in Artificial Intelligence
Graduation with: Merit
Course: Machine Learning, Advanced Data Structure and Algorithms, Data Modeling and Analysis
Sep 2021 - Dec 2022
I have over 5 years of professional experience as a data engineer, with expertise in building data pipelines and data visualization across the Financial, Government, Education, Healthcare, and Marketing industries.
● Design and build reliable data artifacts to support business overviews of organization Human Resource system● Designed partition keys to optimize the ETL process and built a data lake on GCS bucket using Delta Lake to manage HR data, including demographic, geolocation and PII. Ensured ACID Transactions, leveraged highly compressed Parquet format, and reduced storage costs by over $2,000 annuallySr. Data Engineer, AnalyticsVancouver
● Designed and implemented large-scale data pipelines and data warehousing solutions using GCP Big Query for data storage and analysis, PySpark for data processing, and Airflow for orchestration, enabling Defend to gain insights into user behaviour● Designed a Customer Scoring System by segmenting and tagging customers using the RFM model, addressing data skewness through various pre-processing techniques such as log transformation, and feature engineering to enhance segmentation accuracy and improve system performanceData Engineer, Analytics IIVancouver
● Achieved banking digital transformation and data ETL pipeline via stored procedures development in DB2● Optimized 30+ index in an existing 2GB+ DB2 database, improving system query performance by 70%● Automate data loading and logging processes through Shell scripts resulting in 1 FTE savingData Engineer, Analytics IBeijing
● Conceptualize measurement framework and maintain dashboards of key metrics on business health● Established MySQL data warehouses ensuring data consistency and reliability, controlled SQL code version by GitBI EngineerBeijing
Proficient in Python and SQL, with expertise in cloud platforms like AWS and Azure. Experienced in working with various databases, including NoSQL options like MongoDB, and relational databases. Skilled in data analysis, visualization, and BI using tools like Power BI and Tableau. Familiar with data engineering technologies such as Hadoop and Spark, and proficient in handling data pipelines and ETL processes.