Home
Jobs

Site Reliability Engineer (SRE)

8 - 10 years

18 - 25 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Support role - Immediate joiner only. Site Reliability Engineer (SRE) Were looking for a Site Reliability Engineer (SRE) to join our growing team. In this role, youll be responsible for ensuring the reliability, availability, and performance of our systems and services. Youll bridge the gap between development and operations, with a strong focus on technical support, automation, monitoring, and incident response. In short: you will keep systems healthy, respond fast when theyre not, fix problems at the root, prevent future issues and communicate clearly. Responsibilities: Monitoring and Alerting: Maintain and improve system monitoring tools (Grafana, NewRelic). Set up smart, actionable alerts to detect outages or performance issues early. Monitor live systems for signs of security breaches or vulnerabilities. Incident Response: Be on-call to respond to live incidents. Quickly triage and mitigate outages or system degradation. Communicate status updates clearly to internal teams. Troubleshooting and Root Cause Analysis: Debug live systems under pressure. Collect logs, metrics, traces to understand issues. Lead or contribute to postmortem analysis and documentation after incidents. Capacity Planning and Performance Management: Monitor and predict system capacity and scaling needs. Ensure that resources are properly allocated and scaled up if necessary. Maintaining Operational Runbooks: Keep detailed, updated playbooks and runbooks for common incidents and tasks. Cloud & Infrastructure: manages cloud infrastructure (AWS). Manage environment configurations for development, staging, and production. CI/CD Pipelines: Design, implement, and maintain robust CI/CD pipelines to automate the build, test, and deployment processes. Release & Operations: Coordinate with the development team on production releases, patches, and live updates. Work closely with development teams to understand application architecture and deployment needs. Qualifications: Proven experience with Linux and cloud computing technologies, preferably AWS Proficiency in at least one programming/scripting language (Java, Python, Bash) Understanding of containerization and orchestration (Docker, Kubernetes, Terraform). Familiarity with networking fundamentals (TCP/IP, DNS, Load Balancing, firewalls). Experience with database administration and queries: NoSQL/SQL (Redis, PostgreSQL, MongoDB) Experience with observability tools (Grafana, New Relic) Skill in infrastructure as code (Terraform, CloudFormation, Ansible). Experience in a continuous integration / continuous delivery environment Experience with HTTP based services, networking concepts (e.g., TCP/IP, DNS) Strong problem-solving, troubleshooting, debugging skill and communication skills Collaboration mindset: work closely with developers, product managers, and support teams Attention to detail and ownership mentality

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Apex Systems
Apex Systems

Information Technology and Services

Atlanta

1000-5000 Employees

16 Jobs

    Key People

  • Mike McCauley

    CEO
  • Alice McGowan

    CFO

RecommendedJobs for You