Kalash Jindal · Senior Data Engineer

I Turn Raw Data
Into Scalable
Systems

Senior Data Engineer at Nasdaq building distributed data platforms, ETL pipelines, and real-time processing systems with Python, PySpark, SQL, and AWS.

storage
settings
warehouse
Extract → Transform → Load
Scroll
keyboard_arrow_down
dataset
0
Datasets Scaled
trending_down
0
Cost Reduction
workspace_premium
0
Engineering XP
military_tech
0
Cloud Certifications
Status: Live_Environment

// IDENTITY_MODULE:
[DATA_ENGINEER]

The Architect of Unstructured Chaos

Senior Data Engineer with 4+ years in FinTech building scalable data platforms. I architect and scale distributed data systems, construct resilient ETL/ELT pipelines, and handle batch and real-time processing seamlessly to power critical analytics.

Core_Directives: Efficiency Scale
analytics

My Philosophy

"Turning noise into signal is not a process; it is an act of digital refinement."

I prioritize fault-tolerance over convenience and idempotency over speed. Every pipeline I design is a living entity, optimized for the unpredictable flow of massive enterprise data ecosystems.

Portrait Placeholder
SCAN COMPLETE
IDENTITY VERIFIED

Access_Granted

KALASH_JINDAL

monitor_heart System_Vitals Optimal
Pipeline Efficiency94%
Data Quality Score99.2%
UPTIME: 99.9% COMPUTE: 8.4 TFLOPS
Senior-Level Differentiator

How I Approach
Data Engineering

architecture

Scalable Platforms

Architecting robust distributed data platforms capable of securely supporting hundreds of complex datasets and rapid scaling across enterprise environments.

Architecture Data Platforms
account_tree

ETL/ELT Pipelines

Designing end-to-end extraction, transformation, and loading pipelines using PySpark and Databricks. Migrating legacy systems to modern, reliable frameworks.

PySpark Delta Lake
hub

Distributed Systems

Building and maintaining fault-tolerant distributed data ecosystems. Leveraging partitioning, idempotency, and cluster optimization for massive data workloads.

Distributed Compute Scaling
stream

Batch & Real-Time

Handling both extremes of the velocity spectrum. Delivering comprehensive batch processing while architecting low-latency streaming solutions with Kafka.

Kafka Spark Streaming
cloud

Cloud & Orchestration

Executing data strategies on AWS and Azure. Automating workflows with Airflow and ensuring infrastructure consistency via Terraform and strict CI/CD pipelines.

Airflow Terraform
troubleshoot

Optimize & Monitor

Implementing deep observability with alerting to proactively catch failures. Profiling queries and jobs to achieve dramatic cost reduction and performance gains.

Cost Savings Observability
Capabilities_Matrix

Technical DNA

v4.0.2 Stable
polyline PYTHON
Expert 95%

Use Case

Building core ETL logic and API architectures.


Complex algorithmic scripting & ML model deployment.

bolt SPARK
Advanced 90%

Use Case

Used for terabyte-scale data transformations.


Large-scale distributed compute & stream processing.

database SQL_CORE
Expert 92%

Use Case

Designing analytical data warehouses.


Optimized window functions & schema design.

cloud AWS_CLOUD
Advanced 85%

Use Case

S3, Glue, Athena, and EMR orchestrations.


Serverless architectures.

speed DATABRICKS
Advanced 90%

Use Case

Delta Live Tables and collaborative ML workspaces.


Unified analytics platform.

account_tree AIRFLOW
Advanced 88%

Use Case

Scheduling and monitoring complex data workflows.


DAG orchestration & recovery.

change_history DELTA LAKE
Advanced 85%

Use Case

Ensuring reliability for data lake storage.


ACID transactions for data lakes.

hub KAFKA
Intermediate 80%

Use Case

Processing millions of events concurrently.


Real-time streaming pipelines.

schema DBT
Advanced 85%

Use Case

Modularizing and version-controlling SQL.


Data transformation workflows.

deployed_code DOCKER
Advanced 88%

Use Case

Ensuring uniform environments across stages.


Containerization.

ac_unit SNOWFLAKE
Advanced 85%

Use Case

Decoupled compute and storage querying.


Cloud data warehousing.

architecture TERRAFORM
Intermediate 82%

Use Case

Automating reproducible cloud deployments.


Infrastructure as Code (IaC).

cloud_sync AZURE
Intermediate 80%

Use Case

Data Factory and Synapse Analytics.


Enterprise cloud ecosystems.

table_chart BIGQUERY
Intermediate 80%

Use Case

Machine learning and analytics via SQL.


Serverless data warehouse.

dataset REDSHIFT
Intermediate 82%

Use Case

Handling complex analytical workloads.


Petabyte-scale warehousing.

developer_mode FASTAPI
Advanced 85%

Use Case

Deploying data science models as interfaces.


Async Python web framework.

memory REDIS
Intermediate 80%

Use Case

Caching layers for high-throughput reads.


In-memory data structure store.

build JENKINS
Intermediate 80%

Use Case

Automating test and build pipelines.


Continuous integration server.

integration_instructions GH ACTIONS
Advanced 85%

Use Case

Automating CI/CD strictly integrated with GitHub.


Workflow automation.

psychology TENSORFLOW
Intermediate 82%

Use Case

Building custom computer vision and NLP models.


Deep learning and neural networks.

scatter_plot SCIKIT-LEARN
Advanced 90%

Use Case

Predictive modeling and feature engineering.


Machine learning algorithms.

insert_chart POWER BI
Advanced 85%

Use Case

Interactive data visualization.


Business intelligence dashboards.

web STREAMLIT
Advanced 88%

Use Case

Prototyping ML models as web applications.


Interactive data web apps.

Impact Showcase

Before vs After

Before
warning

Legacy Batch System

close 4-hour data latency window
close Manual cron jobs, silent failures
close $280k/yr in legacy infrastructure
close No observability or alerting
close 2–3 outages per week
System Availability
72% — Critical
After
check_circle

Real-Time Event Pipeline

check <100ms end-to-end data latency
check Automated Airflow DAGs, full alerting
check $130k/yr — 54% cost reduction
check Full observability: metrics, logs, traces
check Zero unplanned downtime in 11 months
System Availability
99.9% — Optimal
Architecture Visualization

Typical Pipeline Stack

Click any node to understand its role in the system.

stream
Kafka
Apache Kafka
Distributed message broker. Ingests events from 50+ producers at 2M msgs/second. Provides durable, replayable event streams.
bolt
Spark
Apache Spark
Distributed compute engine. Transforms and aggregates data using Spark Structured Streaming with micro-batch processing at <100ms latency.
folder_open
S3 Lake
AWS S3 Data Lake
Durable, cost-effective object storage. Stores raw + processed data as Parquet files partitioned by date/region for optimal query performance.
warehouse
Warehouse
Snowflake / BigQuery
Cloud data warehouse for analytical queries. dbt models organize data into clean dimensional schemas. Connects to BI tools with <2s query response.
Technical Arsenal

Core Infrastructure

Big Data & DBs

PySpark / Databricks / Delta Lake
Snowflake / Redshift
PostgreSQL / MySQL
Kafka / Real-time Processing

Programming

Python
SQL
Scala
Bash / APIs

Cloud & DevOps

AWS (S3, EMR, Glue, Lambda)
Airflow & Orchestration
Azure / Cloud Workflows
Terraform, Jenkins, Actions
Establish Connection

Establish Connection

Send a message to the engineering core

LinkedIn
GitHub
TRANSMISSION_CONSOLE_v2.1
$ ./debug_mode --enable
Initializing diagnostic scan...
[KERNEL] Loading pipeline telemetry
[INFO] Latency spikes detected in zone 7
[WARN] Null pointer at transform_step_3
[DEBUG] Rolling back to checkpoint v2.4...
█ Pipeline recovered. All systems nominal.

You found the debug console. This is the kind of problem-solving I do daily.