Why would you like to join us?
TransOrg Analytics specializes in Data Science, Data Engineering and Generative AI, providing advanced analytics solutions to industry leaders and Fortune 500 companies across India, US, APAC and the Middle East. We leverage data science to streamline, optimize, and accelerate our clients' businesses. Visit at www.transorg.com to know more about us.
Function:
Role Overview -
India-specific and global data protection regulations
Key Responsibilities
1. Design & Implement ETL / Ingestion Pipelines
- Architect and build scalable
batch and streaming ETL/ELT pipelines
from heterogeneous sources (RDBMS, APIs, files, event streams). - Define and implement
schema management
, incremental loads, SCD patterns, and robust error handling. - Optimize for throughput, latency
throughput ,latency, and cost
, balancing performance with maintainability.
2. Data Platform & Layered Architecture
Landing Layer (Raw / Ingestion):
- Immutable, timestamped storage of all incoming data as-is.
- Enforced controls on file formats, size, naming conventions, and PII boundaries.
Bronze Layer (Standardized / Validated):
- Basic quality checks (schema conformity, null checks, referential integrity).
- Standardization of formats (time zones, encodings, enums).
Silver Layer (Cleansed /Conformed):
- Business rule validations, de-duplication, master data alignment.
- Join/conform data from multiple systems into unified domain models.
Gold Layer (Curated / Analytics & Product Marts):
- Domain-specific marts, aggregates, and feature tables optimized for BI, ML, and APIs.
- Semantic layer for business users and clear data contracts for consuming systems.
Quarantine / Reject Layer:
- Dedicated zone to capture
failed / rejected records
with full context (error codes, source metadata, timestamps). - Design re-processing flows once issues are fixed.
Observability & Governance Layer:
- Centralized logging, metrics, lineage, data catalog, and quality dashboards.
3. Performance, Reliability & Turnaround for Rejects
- Define
SLOs/SLAs for latency, throughput, and error budgets
across ingestion jobs. - Implement
partitioning, indexing, compression, and parallelism
strategies for high-volume data. - Build automated workflows for
reject remediation
: - Classify and route rejects (data quality, schema drift, business rule failures).
- Provide clear visibility and tools for operations/owners to correct data.
- Enable
fast reprocessing
with minimal impact on upstream/downstream systems.
4. Security, Privacy & Compliance
- Implement
end-to-end security
across the pipeline: encryption at rest and in transit, secure secrets management, network isolation, and fine-grained access control (RBAC/ABAC). - Ensure
PII/PHI
is appropriately handled via masking, tokenization, anonymization, and purpose-bound access. - Collaborate with InfoSec, Legal, and Compliance teams to align with:
Indian and Global Compliance and Governance
- Maintain robust
audit trails
, access logs, and change history across the platform. - Enforce
data retention, archival, and deletion policies
in line with regulatory and business requirements.
5. Data Governance & Best Practices
- Partner with Data Governance to implement
data cataloging, lineage, business glossaries
, and data classification. - Define and enforce
data quality SLAs
, validation rules, and monitoring. - Promote
engineering best practices
: - Modular, testable ETL code with unit/integration tests.
- CI/CD for data pipelines and infrastructure-as-code.
- Version control for pipelines, schemas, and transformations.
- Drive adoption of
data contracts
between source teams and consumers to reduce breakages.
6. User Experience & Internal Developer Experience
- Provide a
clean, intuitive experience
for analysts, data scientists, and product teams consuming data: - Clear documentation and discovery via catalogs and semantic layers.
- Self-service access patterns (e.g., governed SQL, APIs, or data sharing mechanisms).
- Reduce friction for source system owners by defining
simple onboarding patterns
for new data sources.
7. Collaboration & Leadership
- Work closely with product, engineering, security, and compliance stakeholders to translate business needs into robust data solutions.
- Mentor junior engineers on data engineering, secure design, and responsible data handling.
- Contribute to the
overall data platform roadmap
and technical standards.
Required Skills & Experience
- Bachelors degree in Computer Science, Software Engineering, Data Science, or related fields.
5-8 years
of experience in Data Engineering / ETL / Data Platform
roles.- Strong expertise in
SQL
and at least one language such as Python or Scala
. - Hands-on experience with:
- Modern ETL/ELT tools and orchestrators (e.g., Airflow, dbt, etc.).
- Big data / distributed processing (e.g., Spark, Flink) and/or streaming platforms (e.g., Kafka, Kinesis, Pub/Sub).
- Cloud data warehouses or data lakes (e.g., Snowflake, BigQuery, Redshift, Databricks, Synapse, S3/Lakehouse patterns).
- Proven track record of building
secure, high-volume ingestion pipelines
in production. - Practical understanding of
data governance, privacy and compliance
(DPDP, GDPR, etc.) and how they translate into technical controls. - Experience setting up
monitoring, alerting, and observability
for data workflows.