The Engineer
Behind the Data
I don't just move data from A to B — I design the roads, traffic systems, and quality checkpoints in between.
From Curious Coder to
Production Systems Builder
I started my journey in data engineering driven by a simple frustration: slow dashboards and broken reports that blocked business decisions. I saw teams waiting 45 minutes for data that should arrive in seconds. That's when I understood — data pipelines are infrastructure, not afterthoughts.
At Nasdaq, I get to solve that problem at industrial scale. I build Medallion Lakehouse architectures, real-time streaming pipelines, and distributed compute systems that process 200+ financial datasets per day — with the kind of accuracy that financial data demands.
Over 4+ years, I've learned that the hardest engineering problems aren't the algorithms — they're the people problems: aligning schemas across teams, making pipelines observable when they silently fail, and building systems that junior engineers can debug at 2am.
I care deeply about reliability, observability, and simplicity. A pipeline that engineers can't understand is a pipeline that will fail in production.
How I Think
The mental models I apply to every system I design — because engineering without principles is just guessing.
Scalability First
Every design decision starts with "what happens at 100x this volume?" Horizontal scaling, partitioning, and idempotency aren't features — they're requirements.
Observability > Debugging
If I can't see what a pipeline is doing, it doesn't exist yet. I instrument everything: record counts, latencies, error rates, schema drift alerts.
Trade-off Clarity
CAP theorem is real. Batch vs streaming. Consistency vs availability. I make these trade-offs explicit in design docs, not discovered in incidents.
Root Cause, Not Symptoms
Production incidents teach more than any course. I build blameless postmortems and fix the systemic cause, not just restart the failing pod.
Lessons from Production
Pipelines without data quality checks
Shipped a pipeline that silently dropped 18% of records for 3 weeks. Now every pipeline has count assertions, null checks, and schema validation built in.
Idempotent, replayable jobs
Every job must be safely re-runnable. Partition-based overwrite patterns, upsert logic, and checkpoint mechanisms save you at 3am.
Schema evolution is a strategy, not a patch
Schema changes downstream break consumers silently. Now I own forward/backward compatibility contracts between every producer-consumer pair.
What I Can Do For You
Whether you need a data platform built from scratch, a crumbling pipeline rescued, or a team mentored — I bring production-grade senior thinking.