Data Engineering & Modeling
ETL pipelines, data modeling, and cloud data infrastructure at scale.
My Approach
Good ML starts with good data infrastructure. I have built ETL pipelines, data models, and cloud architectures from scratch at companies where the data foundation did not exist yet. The goal is always the same: make data reliable, accessible, and ready for downstream ML work.
What This Looks Like in Practice
Cloud Data Architecture
At Wavo, I designed the early cloud architecture for the Music Intelligence Platform, building ETL pipelines in Spark and AWS (S3, EMR, RDS, Glue, Athena) from the ground up.
Legacy System Migration
At Rio Tinto, I developed Spark ETL pipelines to ingest data from legacy systems into AWS, making decades of industrial data available for modern analytics. I also designed a shared data science Python library for the global analytics team.
Platform Evaluation
At Rio Tinto, I led an internal project to review and evaluate commercial Data Science platforms, helping the organization choose the right tools for their scale and needs.
Where I've Done This
Wavo.me
Oct 2018 - Jun 2019
Designed and implemented ETL pipelines in Spark and AWS (S3, EMR, RDS, Glue, Athena) for the Music Intelligence Platform.
Rio Tinto
Jun 2019 - Dec 2020
Developed Spark ETL pipelines to ingest data from legacy systems into AWS. Designed a shared data science Python library for the global analytics team.