Site Reliability Engineer (7 To 11 years) - Big Data

6 - 11 years

12 - 16 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description


 About the Role: 
This role is responsible for managing and maintaining complex, distributed big data ecosystems. It ensures the reliability, scalability, and security of large-scale production infrastructure. Key responsibilities include automating processes, optimizing workflows, troubleshooting production issues, and driving system improvements across multiple business verticals. Roles and Responsibilities: 
  • Manage, maintain, and support incremental changes to Linux/Unix environments.
  • Lead on-call rotations and incident responses, conducting root cause analysis and driving postmortem processes.
  • Design and implement automation systems for managing big data infrastructure, including provisioning, scaling, upgrades, and patching clusters.
  • Troubleshoot and resolve complex production issues while identifying root causes and implementing mitigating strategies.
  • Design and review scalable and reliable system architectures.
  • Collaborate with teams to optimize overall system performance.
  • Enforce security standards across systems and infrastructure.
  • Set technical direction, drive standardization, and operate independently.
  • Ensure availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
  • Resolve, analyze, and respond to system outages and disruptions and implement measures to prevent similar incidents from recurring.
  • Develop tools and scripts to automate operational processes, reducing manual workload, increasing efficiency and improving system resilience.
  • Monitor and optimize system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning.
  • Collaborate with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle.
  • Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities.
  • Develop and enforce SRE best practices and principles.
  • Align across functional teams on priorities and deliverables.
  • Drive automation to enhance operational efficiency.

  •  Skills Required: 
  • Over 6 years of experience managing and maintaining distributed big data ecosystems.
  • Strong expertise in Linux including IP, Iptables, and IPsec.
  • Proficiency in scripting/programming with languages like Perl, Golang, or Python.
  • Hands-on experience with the Hadoop stack (HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot).
  • Familiarity with open-source configuration management and deployment tools such as Puppet, Salt, Chef, or Ansible.
  • Solid understanding of networking, open-source technologies, and related tools.
  • Excellent communication and collaboration skills.
  • DevOps toolsSaltstack, Ansible, docker, Git.
  • SRE Logging and monitoring toolsELK stack, Grafana, Prometheus, opentsdb, Open Telemetry.

  •  Good to Have: 
  • Experience managing infrastructure on public cloud platforms (AWS, Azure, GCP).
  • Experience in designing and reviewing system architectures for scalability and reliability.
  • Experience with observability tools to visualize and alert on system performance.

  •  PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles) 
  •  Insurance Benefits -  Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
  •  Wellness Program -  Employee Assistance Program, Onsite Medical Center, Emergency Support System
  •  Parental Support -  Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
  •  Mobility Benefits -  Relocation benefits, Transfer Support Policy, Travel Policy
  •  Retirement Benefits -  Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment
  •  Other Benefits -  Higher Education Assistance, Car Lease, Salary Advance Policy

  •  Working at PhonePe is a rewarding experience! Great people, a work environment that thrives on creativity, the opportunity to take on roles beyond a defined job description are just some of the reasons you should work with us. Read more about PhonePe on our blog. 
     Life at PhonePe 
     PhonePe in the news 

    Mock Interview

    Practice Video Interview with JobPe AI

    Start Python Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Python Skills

    Practice Python coding challenges to boost your skills

    Start Practicing Python Now
    Phonepe logo
    Phonepe

    Financial Technology

    Bangalore

    RecommendedJobs for You

    Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru

    Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru