SRE - 2 (Big Data)

3 - 5 years

15 - 19 Lacs

Posted:3 weeks ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description


 Job Overview: 
As a Site Reliability Engineer (SRE) specializing in DataPlatform OnPremise, you will play a critical role in deployment, ensuring the reliability, scalability, and performance of our Cloudera Data Platform (CDP) infrastructure. You will collaborate closely with cross-functional teams to design, implement, and maintain robust systems that support our data-driven initiatives. The ideal candidate will have a deep understanding of Data Platform, strong troubleshooting skills, and a proactive mindset towards automation and optimization.You will play a pivotal role in ensuring the smooth functioning, operation, performance and security of large high density Cloudera-based infrastructure. Roles and Responsibilities: 
  • Work on tasks related to implementation of Cloudera Data Platform Cloudera Data Platform on-premises and be a part of planning, installation, configuration, and integration with existing systems.
  • Infrastructure ManagementManage and maintain the Cloudera-based infrastructure, ensuring optimal performance, high availability, and scalability. This includes monitoring system health, and performing routine maintenance tasks.
  • Strong troubleshooting skills and operational expertise in areas such as system capacity, bottlenecks, memory, CPU, OS, storage, and networking.
  • Creating Runbooks and automating them using scripting tools like Shell scripting, Python etc.
  • Working knowledge with any of the configuration management tools like Terraform, Ansible or SALT
  • Data Security and ComplianceImplement and enforce security best practices to safeguard data integrity and confidentiality within the Cloudera environment. Ensure compliance with relevant regulations and standards (e.g., GDPR, HIPAA, DPR).
  • Performance OptimizationContinuously optimize the Cloudera infrastructure to enhance performance, efficiency, and cost-effectiveness. Identify and resolve bottlenecks, tune configurations, and implement best practices for resource utilization.
  • Capacity PlanningPlanning and performance tuning of Hadoop clusters, Monitor resource utilization trends and plan for future capacity needs. Proactively identify potential capacity constraints and propose solutions to address them.
  • Collaborate effectively with infrastructure, network, database, application, and business intelligence teams to ensure high data quality and availability.
  • Work closely with teams to optimize the overall performance of the PhonePe Hadoop ecosystem.
  • Backup and Disaster RecoveryImplement robust backup and disaster recovery strategies to ensure data protection and business continuity. Test and maintain backup and recovery procedures regularly.
  • Develop tools and services to enhance debuggability and supportability.
  • Patches & UpgradesRoutinely apply recommended patches and perform rolling upgrades of the platform in accordance with the advisory from Cloudera, InfoSec and Compliance.
  • Documentation and Knowledge SharingCreate comprehensive documentation for configurations, processes, and procedures related to the Cloudera Data Platform. Share knowledge and best practices with team members to foster continuous learning and improvement.
  • Collaboration and CommunicationCollaborate effectively with cross-functional teams including data engineers, developers, and IT operations personnel. Communicate project status, issues, and resolutions clearly and promptly.

  •  Skills Required: 
  • Bachelor's degree in Computer Science, Engineering, or related field.
  • Proficiency in Linux system administration, shell scripting, and networking concepts including IPtables, and IPsec.
  • Strong understanding of networking, open-source technologies, and tools.
  • 3-5 years of experience in the design, set up, and management of large-scale Hadoop clusters, ensuring high availability, fault tolerance, and performance optimization.
  • Strong understanding of distributed computing principles and experience with Hadoop ecosystem technologies (HDFS, MapReduce, YARN, Hive, Spark, etc.).
  • Experience with Kerberos and LDAP.
  • Strong Knowledge of databases like Mysql,Nosql,Sql server
  • Hands-on experience with configuration management tools (e.g., Salt,Ansible, Puppet, Chef).
  • Strong scripting skills (e.g., PERL,Python, Bash) for automation and troubleshooting.
  • Experience with monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).
  • Knowledge of networking principles and protocols (TCP/IP, UDP, DNS, DHCP, etc.).
  • Experience with managing *nix based machines and strong working knowledge of quintessential Unix programs and tools (e.g. Ubuntu, Fedora, Redhat, etc.)
  • Excellent communication skills and the ability to collaborate effectively with cross-functional teams.
  • Excellent analytical, problem-solving, and troubleshooting skills..
  • Proven ability to work well under pressure and manage multiple priorities simultaneously.

  •  Good To Have: 
  • Cloudera Certified Administrator (CCA) or Cloudera Certified Professional (CCP) certification preferred.
  • Minimum 2 years of experience in managing and administering medium/large hadoop based environments (>100 machines), including Cloudera Data Platform (CDP) experience is highly desirable.
  • Familiarity with Open Data Lake components such as Ozone, Iceberg, Spark, Flink, etc.
  • Familiarity with containerization and orchestration technologies (e.g. Docker, Kubernetes, OpenShift) is a plus
  • Design,develop and maintain Airflow DAGs and tasks to automate BAU processes,ensuring they are robust,scalable and efficient.

  •  PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles) 
  •  Insurance Benefits -  Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
  •  Wellness Program -  Employee Assistance Program, Onsite Medical Center, Emergency Support System
  •  Parental Support -  Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
  •  Mobility Benefits -  Relocation benefits, Transfer Support Policy, Travel Policy
  •  Retirement Benefits -  Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment
  •  Other Benefits -  Higher Education Assistance, Car Lease, Salary Advance Policy

  •  Working at PhonePe is a rewarding experience! Great people, a work environment that thrives on creativity, the opportunity to take on roles beyond a defined job description are just some of the reasons you should work with us. Read more about PhonePe on our blog. 
     Life at PhonePe 
     PhonePe in the news 

    Mock Interview

    Practice Video Interview with JobPe AI

    Start Python Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Python Skills

    Practice Python coding challenges to boost your skills

    Start Practicing Python Now
    Phonepe logo
    Phonepe

    Financial Technology

    Bangalore

    RecommendedJobs for You

    Bengaluru, Karnataka, India

    Noida, Uttar Pradesh, India