Home
Jobs

9 Apache Iceberg Jobs

Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 10.0 years

20 - 25 Lacs

Bengaluru

Work from Office

Naukri logo

The Platform Data Engineer will be responsible for designing and implementing robust data platform architectures, integrating diverse data technologies, and ensuring scalability, reliability, performance, and security across the platform. The role involves setting up and managing infrastructure for data pipelines, storage, and processing, developing internal tools to enhance platform usability, implementing monitoring and observability, collaborating with software engineering teams for seamless integration, and driving capacity planning and cost optimization initiatives.

Posted 1 day ago

Apply

5.0 - 10.0 years

3 - 14 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

Foundit logo

Key Responsibilities : Design & Implement Data Architecture : Design, implement, and maintain the overall data platform architecture ensuring the scalability, security, and performance of the platform. Data Technologies Integration : Select, integrate, and configure data technologies (cloud platforms like AWS , Azure , GCP , data lakes , data warehouses , streaming platforms like Kafka , containerization technologies ). Infrastructure Management : Setup and manage the infrastructure for data pipelines , data storage , and data processing across platforms like Kubernetes and Airflow . Develop Frameworks & Tools : Develop internal frameworks to improve the efficiency and usability of the platform for other teams like Data Engineers and Data Scientists . Data Platform Monitoring & Observability : Implement and manage monitoring and observability for the data platform, ensuring high availability and fault tolerance. Collaboration : Work closely with software engineering teams to integrate the data platform with other business systems and applications. Capacity & Cost Optimization : Involved in capacity planning and cost optimization for data infrastructure, ensuring efficient utilization of resources. Tech Stack Requirements : Apache Iceberg (version 0.13.2): Experience in managing table formats for scalable data storage. Apache Spark (version 3.4 and above): Expertise in building and maintaining batch processing and streaming data processing capabilities. Apache Kafka (version 3.9 and above): Proficiency in managing messaging platforms for real-time data streaming. Role-Based Access Control (RBAC) : Experience with Apache Ranger (version 2.6.0) for implementing and administering security and access controls. RDBMS : Experience working with near real-time data storage solutions , specifically Oracle (version 19c). Great Expectations (version 1.3.4): Familiarity with implementing Data Quality (DQ) frameworks to ensure data integrity and consistency. Data Lineage & Cataloging : Experience with Open Lineage and DataHub (version 0.15.0) for managing data lineage and catalog solutions. Trino (version 4.7.0): Proficiency with query engines for batch processing. Container Platforms : Hands-on experience in managing container platforms such as SKE (version 1.29 on AKS ). Airflow (version 2.10.4): Experience using workflow and scheduling tools for orchestrating and managing data pipelines. DBT (Data Build Tool): Proficiency in using ETL/ELT frameworks like DBT for data transformation and automation. Data Tokenization : Experience with data tokenization technologies like Protegrity (version 9.2) for ensuring data security. Desired Skills : Domain Expertise : Familiarity with the Banking domain is a plus, including working with financial data and regulatory requirements.

Posted 1 week ago

Apply

8.0 - 13.0 years

25 - 40 Lacs

Chennai

Work from Office

Naukri logo

Architect & Build Scalable Systems: Design and implement a petabyte-scale lakehouse Architectures to unify data lakes and warehouses. Real-Time Data Engineering: Develop and optimize streaming pipelines using Kafka, Pulsar, and Flink. Required Candidate profile Data engineering experience with large-scale systems• Expert proficiency in Java for data-intensive applications. Handson experience with lakehouse architectures, stream processing, & event streaming

Posted 1 week ago

Apply

8.0 - 10.0 years

0 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

Foundit logo

NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Cloud Solution Delivery Lead Consultant to join our team in bangalore, Karn?taka (IN-KA), India (IN). Data Engineer Lead Robust hands-on experience with industry standard tooling and techniques, including SQL, Git and CI/CD pipelines mandiroty Management, administration, and maintenance with data streaming tools such as Kafka/Confluent Kafka, Flink Experienced with software support for applications written in Python & SQL Administration, configuration and maintenance of Snowflake & DBT Experience with data product environments that use tools such as Kafka Connect, Synk, Confluent Schema Registry, Atlan, IBM MQ, Sonarcube, Apache Airflow, Apache Iceberg, Dynamo DB, Terraform and GitHub Debugging issues, root cause analysis, and applying fixes Management and maintenance of ETL processes (bug fixing and batch job monitoring) Training & Certification . Apache Kafka Administration Snowflake Fundamentals/Advanced Training . Experience 8 years of experience in a technical role working with AWS At least 2 years in a leadership or management role About NTT DATA NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at NTT DATA endeavors to make accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at . This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click . If you'd like more information on your EEO rights under the law, please click . For Pay Transparency information, please click.

Posted 2 weeks ago

Apply

1.0 - 3.0 years

3 - 5 Lacs

New Delhi, Chennai, Bengaluru

Hybrid

Naukri logo

Your day at NTT DATA We are seeking an experienced Data Engineer to join our team in delivering cutting-edge Generative AI (GenAI) solutions to clients. The successful candidate will be responsible for designing, developing, and deploying data pipelines and architectures that support the training, fine-tuning, and deployment of LLMs for various industries. This role requires strong technical expertise in data engineering, problem-solving skills, and the ability to work effectively with clients and internal teams. What youll be doing Key Responsibilities: Design, develop, and manage data pipelines and architectures to support GenAI model training, fine-tuning, and deployment Data Ingestion and Integration: Develop data ingestion frameworks to collect data from various sources, transform, and integrate it into a unified data platform for GenAI model training and deployment. GenAI Model Integration: Collaborate with data scientists to integrate GenAI models into production-ready applications, ensuring seamless model deployment, monitoring, and maintenance. Cloud Infrastructure Management: Design, implement, and manage cloud-based data infrastructure (e.g., AWS, GCP, Azure) to support large-scale GenAI workloads, ensuring cost-effectiveness, security, and compliance. Write scalable, readable, and maintainable code using object-oriented programming concepts in languages like Python, and utilize libraries like Hugging Face Transformers, PyTorch, or TensorFlow Performance Optimization: Optimize data pipelines, GenAI model performance, and infrastructure for scalability, efficiency, and cost-effectiveness. Data Security and Compliance: Ensure data security, privacy, and compliance with regulatory requirements (e.g., GDPR, HIPAA) across data pipelines and GenAI applications. Client Collaboration: Collaborate with clients to understand their GenAI needs, design solutions, and deliver high-quality data engineering services. Innovation and R&D: Stay up to date with the latest GenAI trends, technologies, and innovations, applying research and development skills to improve data engineering services. Knowledge Sharing: Share knowledge, best practices, and expertise with team members, contributing to the growth and development of the team. Bachelors degree in computer science, Engineering, or related fields (Masters recommended) Experience with vector databases (e.g., Pinecone, Weaviate, Faiss, Annoy) for efficient similarity search and storage of dense vectors in GenAI applications 5+ years of experience in data engineering, with a strong emphasis on cloud environments (AWS, GCP, Azure, or Cloud Native platforms) Proficiency in programming languages like SQL, Python, and PySpark Strong data architecture, data modeling, and data governance skills Experience with Big Data Platforms (Hadoop, Databricks, Hive, Kafka, Apache Iceberg), Data Warehouses (Teradata, Snowflake, BigQuery), and lakehouses (Delta Lake, Apache Hudi) Knowledge of DevOps practices, including Git workflows and CI/CD pipelines (Azure DevOps, Jenkins, GitHub Actions) Experience with GenAI frameworks and tools (e.g., TensorFlow, PyTorch, Keras) Nice to have: Experience with containerization and orchestration tools like Docker and Kubernetes Integrate vector databases and implement similarity search techniques, with a focus on GraphRAG is a plus Familiarity with API gateway and service mesh architectures Experience with low latency/streaming, batch, and micro-batch processing Familiarity with Linux-based operating systems and REST APIs

Posted 2 weeks ago

Apply

8.0 - 12.0 years

0 Lacs

Mumbai, Maharashtra, India

On-site

Foundit logo

Introduction A career in IBM Consulting is rooted by long-term relationships and close collaboration with clients across the globe. Youll work with visionaries across multiple industries to improve the hybrid cloud and AI journey for the most innovative and valuable companies in the world. Your ability to accelerate impact and make meaningful change for your clients is enabled by our strategic partner ecosystem and our robust technology platforms across the IBM portfolio including Software and Red Hat. Curiosity and a constant quest for knowledge serve as the foundation to success in IBM Consulting. In your role, youll be encouraged to challenge the norm, investigate ideas outside of your role, and come up with creative solutions resulting in ground breaking impact for a wide network of clients. Our culture of evolution and empathy centers on long-term career growth and development opportunities in an environment that embraces your unique skills and experience Your role and responsibilities Role Overview : We are looking for an experienced Denodo SME to design, implement, and optimize data virtualization solutions using Denodo as the enterprise semantic and access layer over a Cloudera-based data lakehouse. The ideal candidate will lead the integration of structured and semi-structured data across systems, enabling unified access for analytics, BI, and operational use cases. Key Responsibilities: Design and deploy the Denodo Platform for data virtualization over Cloudera, RDBMS, APIs, and external data sources. Define logical data models , derived views, and metadata mappings across layers (integration, business, presentation). Connect to Cloudera Hive, Impala, Apache Iceberg , Oracle, and other on-prem/cloud sources. Publish REST/SOAP APIs, JDBC/ODBC endpoints for downstream analytics and applications. Tune virtual views, caching strategies, and federation techniques to meet performance SLAs for high-volume data access. Implement Denodo smart query acceleration , usage monitoring, and access governance. Configure role-based access control (RBAC) , row/column-level security, and integrate with enterprise identity providers (LDAP, Kerberos, SSO). Work with data governance teams to align Denodo with enterprise metadata catalogs (e.g., Apache Atlas, Talend). Required education Bachelors Degree Preferred education Masters Degree Required technical and professional expertise Skills Required : 8-12 years in data engineering, with 4+ years of hands-on experience in Denodo Platform . Strong experience integrating RDBMS (Oracle, SQL Server), Cloudera CDP (Hive, Iceberg), and REST/SOAP APIs. Denodo Admin Tool, VQL, Scheduler, Data Catalog SQL, Shell scripting, basic Python (preferred). Deep understanding of query optimization , caching, memory management, and federation principles. Experience implementing data security, masking, and user access control in Denodo.

Posted 3 weeks ago

Apply

4.0 - 9.0 years

10 - 20 Lacs

Hyderabad, Chennai, Bengaluru

Work from Office

Naukri logo

JD: • Good experience in Apache Iceberg, Apache Spark, Trino • Proficiency in SQL and data modeling • Experience with open Data Lakehouse using Apache Iceberg • Experience with Data Lakehouse architecture with Apache Iceberg and Trino

Posted 4 weeks ago

Apply

7.0 - 12.0 years

10 - 20 Lacs

Hyderabad

Remote

Naukri logo

Job Title: Senior Data Engineer Location: Remote Job Type: Fulltime Experience Level: 7+ years About the Role: We are seeking a highly skilled Senior Data Engineer to join our team in building a modern data platform on AWS. You will play a key role in transitioning from legacy systems to a scalable, cloud-native architecture using technologies like Apache Iceberg, AWS Glue, Redshift, and Atlan for governance. This role requires hands-on experience across both legacy (e.g., Siebel, Talend, Informatica) and modern data stacks. Responsibilities: Design, develop, and optimize data pipelines and ETL/ELT workflows on AWS. Migrate legacy data solutions (Siebel, Talend, Informatica) to modern AWS-native services. Implement and manage a data lake architecture using Apache Iceberg and AWS Glue. Work with Redshift for data warehousing solutions including performance tuning and modelling. Apply data quality and observability practices using Soda or similar tools. Ensure data governance and metadata management using Atlan (or other tools like Collibra, Alation). Collaborate with data architects, analysts, and business stakeholders to deliver robust data solutions. Build scalable, secure, and high-performing data platforms supporting both batch and real-time use cases. Participate in defining and enforcing data engineering best practices. Required Qualifications: 7+ years of experience in data engineering and data pipeline development. Strong expertise with AWS services, especially Redshift, Glue, S3, and Athena. Proven experience with Apache Iceberg or similar open table formats (like Delta Lake or Hudi). Experience with legacy tools like Siebel, Talend, and Informatica. Knowledge of data governance tools like Atlan, Collibra, or Alation. Experience implementing data quality checks using Soda or equivalent. Strong SQL and Python skills; familiarity with Spark is a plus. Solid understanding of data modeling, data warehousing, and big data architectures. Strong problem-solving skills and the ability to work in an Agile environment.

Posted 1 month ago

Apply

4.0 - 7.0 years

10 - 14 Lacs

Noida

Work from Office

Naukri logo

Location: Noida (In-office/Hybrid; client site if required) Type: Full-Time | Immediate Joiners Preferred Must-Have Skills: GCP (BigQuery, Dataflow, Dataproc, Cloud Storage) PySpark / Spark Distributed computing expertise Apache Iceberg (preferred), Hudi, or Delta Lake Role Overview: Be part of a high-impact Data Engineering team focused on building scalable, cloud-native data pipelines. You'll support and enhance EMR platforms using DevOps principles, helping deliver real-time health alerts and diagnostics for platform performance. Key Responsibilities: Provide data engineering support to EMR platforms Design and implement cloud-native, automated data solutions Collaborate with internal teams to deliver scalable systems Continuously improve infrastructure reliability and observability Technical Environment: Databases: Oracle, MySQL, MSSQL, MongoDB Distributed Engines: Spark/PySpark, Presto, Flink/Beam Cloud Infra: GCP (preferred), AWS (nice-to-have), Terraform Big Data Formats: Iceberg, Hudi, Delta Tools: SQL, Data Modeling, Palantir Foundry, Jenkins, Confluence Bonus: Stats/math tools (NumPy, PyMC3), Linux scripting Ideal for engineers with cloud-native, real-time data platform experience especially those who have worked with EMR and modern lakehouse stacks.

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies