Active Job

ETL_JOB_ALPHA

Throughput

2.1M/s

verified

Pipeline Status

99.9% Uptime

Kalash Jindal · Senior Data Engineer

I Turn Raw Data
Into Scalable
Systems

Senior Data Engineer at Nasdaq building distributed data platforms, ETL pipelines, and real-time processing systems with Python, PySpark, SQL, and AWS.

storage

settings

warehouse

Extract → Transform → Load

View Projects arrow_forward play_arrow Play ETL Game description Resume

Scroll

keyboard_arrow_down

dataset

Datasets Scaled

trending_down

Cost Reduction

workspace_premium

Engineering XP

military_tech

Cloud Certifications

Status: Live_Environment

// IDENTITY_MODULE:
[DATA_ENGINEER]

The Architect of Unstructured Chaos

Senior Data Engineer with 4+ years in FinTech building scalable data platforms. I architect and scale distributed data systems, construct resilient ETL/ELT pipelines, and handle batch and real-time processing seamlessly to power critical analytics.

Core_Directives: Efficiency Scale

analytics

My Philosophy

"Turning noise into signal is not a process; it is an act of digital refinement."

I prioritize fault-tolerance over convenience and idempotency over speed. Every pipeline I design is a living entity, optimized for the unpredictable flow of massive enterprise data ecosystems.

SCAN COMPLETE
IDENTITY VERIFIED

Access_Granted

KALASH_JINDAL

monitor_heart System_Vitals Optimal

Pipeline Efficiency94%

Data Quality Score99.2%

UPTIME: 99.9% COMPUTE: 8.4 TFLOPS

Senior-Level Differentiator

How I Approach
Data Engineering

architecture

Scalable Platforms

Architecting robust distributed data platforms capable of securely supporting hundreds of complex datasets and rapid scaling across enterprise environments.

Architecture Data Platforms

account_tree

ETL/ELT Pipelines

Designing end-to-end extraction, transformation, and loading pipelines using PySpark and Databricks. Migrating legacy systems to modern, reliable frameworks.

PySpark Delta Lake

hub

Distributed Systems

Building and maintaining fault-tolerant distributed data ecosystems. Leveraging partitioning, idempotency, and cluster optimization for massive data workloads.

Distributed Compute Scaling

stream

Batch & Real-Time

Handling both extremes of the velocity spectrum. Delivering comprehensive batch processing while architecting low-latency streaming solutions with Kafka.

Kafka Spark Streaming

cloud

Cloud & Orchestration

Executing data strategies on AWS and Azure. Automating workflows with Airflow and ensuring infrastructure consistency via Terraform and strict CI/CD pipelines.

Airflow Terraform

troubleshoot

Optimize & Monitor

Implementing deep observability with alerting to proactively catch failures. Profiling queries and jobs to achieve dramatic cost reduction and performance gains.

Cost Savings Observability

Capabilities_Matrix

Technical DNA

v4.0.2 Stable

polyline PYTHON

Expert 95%

Use Case

Building core ETL logic and API architectures.

Complex algorithmic scripting & ML model deployment.

bolt SPARK

Advanced 90%

Use Case

Used for terabyte-scale data transformations.

Large-scale distributed compute & stream processing.

database SQL_CORE

Expert 92%

Use Case

Designing analytical data warehouses.

Optimized window functions & schema design.

cloud AWS_CLOUD

Advanced 85%

Use Case

S3, Glue, Athena, and EMR orchestrations.

Serverless architectures.

speed DATABRICKS

Advanced 90%

Use Case

Delta Live Tables and collaborative ML workspaces.

Unified analytics platform.

account_tree AIRFLOW

Advanced 88%

Use Case

Scheduling and monitoring complex data workflows.

DAG orchestration & recovery.

change_history DELTA LAKE

Advanced 85%

Use Case

Ensuring reliability for data lake storage.

ACID transactions for data lakes.

hub KAFKA

Intermediate 80%

Use Case

Processing millions of events concurrently.

Real-time streaming pipelines.

schema DBT

Advanced 85%

Use Case

Modularizing and version-controlling SQL.

Data transformation workflows.

deployed_code DOCKER

Advanced 88%

Use Case

Ensuring uniform environments across stages.

Containerization.

ac_unit SNOWFLAKE

Advanced 85%

Use Case

Decoupled compute and storage querying.

Cloud data warehousing.

architecture TERRAFORM

Intermediate 82%

Use Case

Automating reproducible cloud deployments.

Infrastructure as Code (IaC).

cloud_sync AZURE

Intermediate 80%

Use Case

Data Factory and Synapse Analytics.

Enterprise cloud ecosystems.

table_chart BIGQUERY

Intermediate 80%

Use Case

Machine learning and analytics via SQL.

Serverless data warehouse.

dataset REDSHIFT

Intermediate 82%

Use Case

Handling complex analytical workloads.

Petabyte-scale warehousing.

developer_mode FASTAPI

Advanced 85%

Use Case

Deploying data science models as interfaces.

Async Python web framework.

memory REDIS

Intermediate 80%

Use Case

Caching layers for high-throughput reads.

In-memory data structure store.

build JENKINS

Intermediate 80%

Use Case

Automating test and build pipelines.

Continuous integration server.

integration_instructions GH ACTIONS

Advanced 85%

Use Case

Automating CI/CD strictly integrated with GitHub.

Workflow automation.

psychology TENSORFLOW

Intermediate 82%

Use Case

Building custom computer vision and NLP models.

Deep learning and neural networks.

scatter_plot SCIKIT-LEARN

Advanced 90%

Use Case

Predictive modeling and feature engineering.

Machine learning algorithms.

insert_chart POWER BI

Advanced 85%

Use Case

Interactive data visualization.

Business intelligence dashboards.

web STREAMLIT

Advanced 88%

Use Case

Prototyping ML models as web applications.

Interactive data web apps.

Impact Showcase

Before vs After

Before

warning

Legacy Batch System

close 4-hour data latency window

close Manual cron jobs, silent failures

close $280k/yr in legacy infrastructure

close No observability or alerting

close 2–3 outages per week

System Availability

72% — Critical

After

check_circle

Real-Time Event Pipeline

check <100ms end-to-end data latency

check Automated Airflow DAGs, full alerting

check $130k/yr — 54% cost reduction

check Full observability: metrics, logs, traces

check Zero unplanned downtime in 11 months

System Availability

99.9% — Optimal

Architecture Visualization

Typical Pipeline Stack

Click any node to understand its role in the system.

stream

Kafka

↓

bolt

Spark

↓

folder_open

S3 Lake

↓

warehouse

Warehouse

play_arrow Experience this pipeline interactively in the ETL Game

Technical Arsenal

Core Infrastructure

Big Data & DBs

PySpark / Databricks / Delta Lake

Snowflake / Redshift

PostgreSQL / MySQL

Kafka / Real-time Processing

Programming

Python

SQL

Scala

Bash / APIs

Cloud & DevOps

AWS (S3, EMR, Glue, Lambda)

Airflow & Orchestration

Azure / Cloud Workflows

Terraform, Jenkins, Actions

Establish Connection

Send a message to the engineering core

GitHub

$ ./debug_mode --enable
Initializing diagnostic scan...
[KERNEL] Loading pipeline telemetry
[INFO] Latency spikes detected in zone 7
[WARN] Null pointer at transform_step_3
[DEBUG] Rolling back to checkpoint v2.4...
█ Pipeline recovered. All systems nominal.

You found the debug console. This is the kind of problem-solving I do daily.

I Turn Raw DataInto ScalableSystems

// IDENTITY_MODULE: [DATA_ENGINEER]

The Architect of Unstructured Chaos

My Philosophy

How I ApproachData Engineering

Scalable Platforms

ETL/ELT Pipelines

Distributed Systems

Batch & Real-Time

Cloud & Orchestration

Optimize & Monitor

Technical DNA

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Use Case

Before vs After

Legacy Batch System

Real-Time Event Pipeline

Typical Pipeline Stack

Core Infrastructure

Big Data & DBs

Programming

Cloud & DevOps

Establish Connection

I Turn Raw Data
Into Scalable
Systems

// IDENTITY_MODULE:
[DATA_ENGINEER]

How I Approach
Data Engineering