Site Reliability Engineer

8 - 13 years

8 - 12 Lacs

Posted:20 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description


The IBM Cloud Privileged Access Gateway team is growing and looking to add a Cloud Developers to its team of skilled architects and developers. In this team, you will work in an agile, collaborative environment to build, deploy, configure, and support services in the IBM Cloud. Your responsibilities will encompass the design and implementation of innovative features/automation, fine-tuning and sustaining existing code for optimal performance, uncovering efficiencies, supporting adopters globally, and driving to deliver a highly available cloud offering within IBM Cloud Security Services.As a  Senior   Site Reliability Engineer  you will work in an agile, collaborative environment to build, deploy, configure, and support services in the IBM Cloud. Your responsibilities will encompass the design and implementation of innovative features/automation, fine-tuning and sustaining existing code for optimal performance, uncovering efficiencies, supporting adopters globally, and driving to deliver a highly available cloud offering within IBM Cloud Security Services. As a senior member of the team, you will also drive technical strategy, mentor peers, and partner closely with engineering teams to ensure that services meet the highest standards of reliability and performance.
Key Responsibilities: Reliability Engineering:  Design, implement, and maintain highly available and fault-tolerant systems across distributed environments.
 Systems Thinking:  Architect infrastructure with scalability, resilience, and disaster recovery in mind.
 Automation First:  Eliminate toil by developing automation for provisioning, deployments, scaling, and monitoring.
 24X7 Observability:  Be part of a worldwide team that monitors the health of production systems and services around the clock, ensuring continuous reliability and optimal customer experience. Build and maintain robust monitoring, logging, and alerting systems (Prometheus, Grafana, ELK, OpenTelemetry).
 Performance Optimization:  Identify bottlenecks and optimize system performance for latency, throughput, and efficiency.
 Incident Management & Cross-Functional troubleshooting:  Lead on-call rotations, root cause analysis, and post-mortems to improve incident response. Collaborate with engineering teams to provide initial assessments and possible workarounds for production issues. Troubleshoot and resolve production issues effectively.
 Deployment and Configuration:  Leverage Continuous Delivery (CI/CD) tools to deploy services and configuration changes at enterprise scale.
 Security and Compliance Implementation:  Implementing security measures that meet or exceed industry standards for regulations such as GDPR, SOC2, ISO 27001, PCI, HIPAA, and FBA.
 Maintenance and Support:  Tasks related to applying Infrastructure such as OpenShift, Kubernetes, Databases - security patches and upgrades, supporting Pager Duty rotations.
 Collaboration:  Partner with software engineering teams to ensure systems are reliable, testable, and observable from day one.
 Mentorship:  Guide and coach junior engineers in SRE practices, automation, and operational excellence.
 Innovation:  Continuously evaluate and introduce new tools, frameworks, and approaches to improve reliability and developer productivity
 Accountability:  Take ownership of systems end-to-end, from architecture through operations and long-term maintenance.
 Agile Methodology Collaborate with cross-functional teams in an Agile environment, contributing to sprint planning, daily stand-ups, and retrospectives.
 Continuous Learning Stay up-to-date with the latest technologies, trends, and best practices in DevOps/SRE, AI, and cloud computing.
 Team Collaboration Work closely with international teams, ensuring effective communication and delivering high-quality features on time.
Required education Bachelor's Degree Preferred education Master's Degree Required technical and professional expertise
  • 5–8+ years of infrastructure engineer with proven record for delivering high-quality, large-scale solutions.
  • Working knowledge with one or more Linux based operating systems such as RHEL (Preferred), Ubuntu, CentOS Linux.
  • Strong experience in working with production Kubernetes/OpenShift environments.
  • Strong programming/scripting skills (Bash, Go, Python, or similar).
  • Knowledge of Security concepts (includes understanding of identity mgmt./authentication, authorization, firewall, auditing, secure communication, managing certificates, password management).
  • Working knowledge with one or more key infrastructure tools/productsActive Directory, Terraform, Ansible, Chef, etc.
  • Working knowledge with one or more Virtualization technologiesCitrix Hypervisor (Preferred), VMware vSphere, Ubuntu KVM, etc.
  • Strong experience with CI/CD pipelines, automation, and DevOps practices.
  • Working knowledge in SQL or NoSQL database such as DB2/Oracle/Postgre/MySQL/MongoDB, etc.
  • Working knowledge with observability stacks (Sysdig, Prometheus, Grafana, ELK, OpenTelemetry).
  • Experience leading technical initiatives, mentoring developers, and influencing architectural direction.
  • Experience with Continuous Integration/Continuous Delivery (CI/CD) methodologies.
  • Experience with Agile development methodologies, such as Scrum or Kanban.
  • Ability to learn quickly and contribute in fast-paced environment.
  • Ability to approach complex issues with a logical, solutions-oriented mindset, attention to detail and multitasking skills.
  • Strong communication skills and the ability to work effectively within a team-oriented environment.
  • Strong interpersonal skills, with a collaborative, adaptable, and proactive attitude.

  • Preferred technical and professional experience
  • Experience with tools such as GitHub and ServiceNow.
  • Experience with microservice architectures and Restful API development.
  • Familiarity using Container Security tools such as Prisma Cloud & AquaSec.
  • Experience in DevSecOps pipelines - Jenkins, Tekton Toolchains
  • Mock Interview

    Practice Video Interview with JobPe AI

    Start DevOps Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Golang Skills

    Practice Golang coding challenges to boost your skills

    Start Practicing Golang Now
    IBM logo
    IBM

    Information Technology

    Armonk

    RecommendedJobs for You

    hyderabad, chennai, bengaluru

    hyderabad, telangana, india