Home
Jobs

Site Reliability Engineer

5 - 7 years

10 - 19 Lacs

Posted:13 hours ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Looking for Immediate Joiners Perform Incident Management and Change Management to maintain the continuous availability of all Cloud Infrastructure services. Ensure all SRE and operating procedures are maintained and executed. Maintain a 24x7 production environment with a high level of service availability and perform quality reviews, manage operational issues. Perform root cause analysis for major incidents and drive the process by involving required stakeholders. Perform problem management by analyzing metrics, alarms and dashboards to troubleshoot problem areas, report issues to assist in performance tuning and fault finding. Implementation of proactive monitoring, alerting, trend analysis, and self-healing solutions. Explore and innovate new technologies, features, and tools to improve the platform and automate operational tasks using Bash, Python or any other programming language. Manage and maintain Runbooks and Standard Operating procedures Manage, coordinate, and document all types of maintenance activities and outages. Perform patching and upgrades for vulnerability management. Work closely with the teams to initiate the development of new ideas into internal tools. Understand the existing architecture and work with various Engineering teams to develop and execute strategies to provide a high-quality production service. Capable of working a flexible work schedule in a 24 x 7 environment with rotational shifts Qualifications: Bachelors degree in computer science, electrical engineering or a related area, with 7+ years of SRE experience in a large enterprise organization System admin experience on Linux environments. Experience with end-to-end monitoring setup for infra and applications Experience with Prometheus, Grafana, ELK, Opensearch, Cloudwatch, PagerDuty and other monitoring tools. Solid experience with Cloud Technologies such as AWS and OCI. Good experience with containerized workloads tools like Kubernetes. Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is required. Experience with BGP, NAT, TCP/IP, iBGP, Proxies, Cross connects. Experience with L2/L3 switching, knowledge of Juniper and Cisco routing devices. Experience understanding and managing web servers (Apache, Tomcat, Nginx) Ability to script/program with one or more high level languages, such as Python, Go, etc. Experience with any configuration management tools like Salt or Puppet or Ansible or similar. Experience with source control tools such as Github and SVN. Experience with deployment tools Jenkins, Harness etc. Experience with SQL and NoSQL databases like Redis, CouchBase, Cassandra, Crate, Elasticsearch. Experience in performing and writing Root Cause Analysis documents. Strong communication and analytical/problem-solving skills. Systematic approach and to drive problems to resolution. Good to have experience/knowledge of GCP, Azure Experience in Security domain will be added advantageRole & responsibilities Preferred candidate profile

Mock Interview

Practice Video Interview with JobPe AI

Start Linux Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Encora
Encora

Book and Periodical Publishing

Santo Domingo Distrito Nacional

2-10 Employees

49 Jobs

    Key People

  • Mark M. Johnson

    CEO
  • Dino M. Gika

    President

RecommendedJobs for You

Hyderabad, Bengaluru, Thiruvananthapuram

Hyderabad, Ahmedabad, Bengaluru