Note: This is a remote role with occasional office visits. Candidates from Mumbai or Pune will be preferred
About The Company
Operating at the forefront of
cloud analytics, big-data platform engineering, and enterprise AI
, our teams design mission-critical data infrastructure for global clients across finance, retail, telecom, and emerging tech. We build distributed ingestion pipelines on Azure & Databricks, unlock real-time insights with Spark/Kafka, and automate delivery through modern DevOps so businesses can act on high-fidelity data, fast.
Role & Responsibilities
- Engineer robust data pipelines: build scalable batch & streaming workflows with Apache Spark, Kafka, and Azure Data Factory/Databricks.
- Implement Delta Lakehouse layers: design bronze-silver-gold medallion architecture to guarantee data quality and lineage.
- Automate CI/CD for ingestion: create Git-based workflows, containerized builds, and automated testing to ship reliable code.
- Craft clean, test-driven Python: develop modular PySpark/Pandas services, enforce SOLID principles, and maintain git-versioned repos.
- Optimize performance & reliability: profile jobs, tune clusters, and ensure SLAs for throughput, latency, and cost.
- Collaborate in Agile squads: partner with engineers, analysts, and consultants to translate business questions into data solutions.
Skills & Qualifications
- Must-Have
- 1-2 yrs hands-on with Apache Spark or Kafka and Python (PySpark/Pandas/Polars).
- Experience building Delta Lake / medallion architectures on Azure or Databricks.
- Proven ability to design event-driven pipelines and write unit/integration tests.
- Git-centric workflow knowledge plus CI/CD tooling (GitHub Actions, Azure DevOps).
- Preferred
- Exposure to SQL/Relational & NoSQL stores and hybrid lake-house integrations.
- STEM/computer-science degree or equivalent foundation in algorithms and OOP.
Benefits & Culture Highlights
- Flexible, remote-first teams: outcome-driven culture with quarterly hackathons and dedicated learning budgets.
- Growth runway: clear promotion paths from Associate to Senior Engineer, backed by certified Azure & Databricks training.
- Inclusive collaboration: small, empowered Agile squads that value knowledge-sharing, mentorship, and transparent feedback.
Skills: modern javascript,cloud,vector databases,angular,pipelines,ci,containerization,apache spark,aws,ml,langchain,shell scripting,kafka,performance testing,mlops,pandas,knowledge-graph design (rdf/owl/sparql),sql,data,feature engineering,nosql,delta lake,ci/cd,python,aws services (sagemaker, bedrock, lambda),pyspark,synthetic-data augmentation,generative ai,data-cataloging,metadata management,databricks,git,lineage,data governance,azure