← Back to Expertise

Data Engineering & Modeling

ETL pipelines, data modeling, and cloud data infrastructure at scale.

My Approach

Good ML starts with good data infrastructure. I have built ETL pipelines, data models, and cloud architectures from scratch at companies where the data foundation did not exist yet. The goal is always the same: make data reliable, accessible, and ready for downstream ML work.

What This Looks Like in Practice

Cloud Data Architecture

At Wavo, I designed the early cloud architecture for the Music Intelligence Platform, building ETL pipelines in Spark and AWS (S3, EMR, RDS, Glue, Athena) from the ground up.

Legacy System Migration

At Rio Tinto, I developed Spark ETL pipelines to ingest data from legacy systems into AWS, making decades of industrial data available for modern analytics. I also designed a shared data science Python library for the global analytics team.

Platform Evaluation

At Rio Tinto, I led an internal project to review and evaluate commercial Data Science platforms, helping the organization choose the right tools for their scale and needs.