Job
Description
Role: Senior Data Engineer with Databricks. Experience: 5+ Years Job Type: Contract Contract Duration: 6 Months Budget: 1.0 lakh per month Location : Remote JOB DESCRIPTION: We are looking for a dynamic and experienced Senior Data Engineer – Databricks to design, build, and optimize robust data pipelines using the Databricks Lakehouse platform. The ideal candidate should have strong hands-on skills in Apache Spark, PySpark, cloud data services, and a good grasp of Python and Java. This role involves close collaboration with architects, analysts, and developers to deliver scalable and high-performing data solutions across AWS, Azure, and GCP. ESSENTIAL JOB FUNCTIONS 1. Data Pipeline Development • Build scalable and efficient ETL/ELT workflows using Databricks and Spark for both batch and streaming data. • Leverage Delta Lake and Unity Catalog for structured data management and governance. • Optimize Spark jobs by tuning configurations, caching, partitioning, and serialization techniques. 2. Cloud-Based Implementation • Develop and deploy data workflows onAWS (S3, EMR,Glue), Azure (ADLS, ADF, Synapse), and/orGCP (GCS, Dataflow, BigQuery). • Manage and optimize data storage, access control, and pipeline orchestration using native cloud tools. • Use tools like Databricks Auto Loader and SQL Warehousing for efficient data ingestion and querying. 3. Programming & Automation • Write clean, reusable, and production-grade code in Python and Java. • Automate workflows using orchestration tools(e.g., Airflow, ADF, or Cloud Composer). • Implement robust testing, logging, and monitoring mechanisms for data pipelines. 4. Collaboration & Support • Collaborate with data analysts, data scientists, and business users to meet evolving data needs. • Support production workflows, troubleshoot failures, and resolve performance bottlenecks. • Document solutions, maintain version control, and follow Agile/Scrum processes Required Skills Technical Skills: • Databricks: Hands-on experience with notebooks, cluster management, Delta Lake, Unity Catalog, and job orchestration. • Spark: Expertise in Spark transformations, joins, window functions, and performance tuning. • Programming: Strong in PySpark and Java, with experience in data validation and error handling. • Cloud Services: Good understanding of AWS, Azure, or GCP data services and security models. • DevOps/Tools: Familiarity with Git, CI/CD, Docker (preferred), and data monitoring tools. Experience: • 5–8 years of data engineering or backend development experience. • Minimum 1–2 years of hands-on work in Databricks with Spark. • Exposure to large-scale data migration, processing, or analytics projects. Certifications (nice to have): Databricks Certified Data Engineer Associate Working Conditions Hours of work - Full-time hours; Flexibility for remote work with ensuring availability during US Timings. Overtime expectations - Overtime may not be required as long as the commitment is accomplished Work environment - Primarily remote; occasional on-site work may be needed only during client visit. Travel requirements - No travel required. On-call responsibilities - On-call duties during deployment phases. Special conditions or requirements - Not Applicable. Workplace Policies and Agreements Confidentiality Agreement: Required to safeguard client sensitive data. Non-Compete Agreement: Must be signed to ensure proprietary model security. Non-Disclosure Agreement: Must be signed to ensure client confidentiality and security. Show more Show less