ETL Game — Kalash Jindal | Interactive Pipeline Simulator

Interactive Simulation

Experience What
I Do

Instead of just telling you about data pipelines, try this interactive ETL simulation.

Extract → Transform → Load. In under 3 minutes.

storage

Extract Database / API

settings

Transform Filter / Clean

warehouse

Load Warehouse / Cache

view_list View Projects

Pipeline Sequence Stage 01/03

STEP_01 Extract

STEP_02 Transform

STEP_03 Load

Extraction Zone

Select 2+ data sources to initialize the pipeline.

READY

0 sources selected

INCOMING DATA PREVIEW

[STREAM] REST_API → batch_001
{ "id": "USR_001", "name": "Alex Carter", email": null, "ts": "2024-01-15T08:22:01Z" }
{ "id": "USR_002", "name": "Sarah Miller", "email": "sarah@data.io", "ts": "2024-01-15T08:22:05Z" }
{ "id": "USR_001", "name": "alex carter", /* DUPLICATE */ "email": "alex@data.io" }

[STREAM] SQL_DB → snapshot_v2
SELECT * FROM users WHERE region = 'APAC';
ERROR: last_login = "undefined" -- row 412
MISSING_FIELD: status -- row 2091

[STREAM] KAFKA → realtime_events
{ "event": "purchase", "amount": 142.50, "currency": "USD", "user_id": "USR_042" }
{ "event": "purchase", "amount": 142.50, /* DUPLICATE */ "user_id": "USR_042" }
{ "event": "login", "timestamp": "NaN", "user_id": "USR_099" }

error 23 errors detected content_copy 7 duplicates format_color_fill 4 null fields

terminal

Data Buffer: Stage_2.RAW

[SCHEMA_ERROR] { "id": "USR_001", "name": "Alex Carter", email: null, "ts": "2024-01-15T08:22:01Z" }
[DUPLICATE] { "id": "USR_001", "name": "alex carter", "email": "alex@data.io" }
[OK] { "id": "USR_002", "name": "Sarah Miller", "email": "sarah@data.io", "ts": "2024-01-15T08:22:05Z" }
[MISSING_FIELD] { "id": "USR_003", "name": "Jason Bourne", "ts": "2024-01-15T09:00:00Z" }
[BAD_VALUE] { "event": "login", timestamp: "NaN", "user_id": "USR_099" }
[DUPLICATE] { "event": "purchase", "amount": 142.50, "user_id": "USR_042" }

Transform Progress

radio_button_unchecked

Deduplication

Remove 7 duplicate records

radio_button_unchecked

Null Handling

Fill/drop 4 null fields

radio_button_unchecked

Filters Applied

Remove bad timestamps, invalid values

radio_button_unchecked

Schema Normalized

Standardize fields and casing

Data Quality Score

Apply all transforms for 100%

Live Record Count

—

Pipeline Operation

STAGE 03: THE LOAD

Current Throughput

840 GB/s

cleaning_services

Cleaned Stream

hub

Select Target...

Terminal: Ingestion_Log_v2.04

[INFO] Cleaned dataset ready: 1,892 records

[INFO] Schema validation: PASSED

[READY] Awaiting target warehouse selection...

military_tech

verified

Final Rank

Senior Data
Engineer

Operation Successful

Pipeline Performance Report

check_circle

You just built a data pipeline. This is what I do at scale — processing billions of records with real-time fault tolerance.

Score

—

Accuracy

—

Speed

—

Efficiency

—

View My Projects arrow_forward

Experience What
I Do

Extraction Zone

Data Buffer: Stage_2.RAW

Transform Progress

STAGE 03: THE LOAD

Cleaned Stream

Snowflake_WH

BigQuery_Archive

Redis_Cache

Final Rank

Operation Successful

Experience WhatI Do

Extraction Zone

Data Buffer: Stage_2.RAW

Transform Progress

STAGE 03: THE LOAD

Cleaned Stream

Snowflake_WH

BigQuery_Archive

Redis_Cache

Final Rank

Operation Successful

Experience What
I Do