Site Reliability Engineer (NOC)

5 - 9 years

0 Lacs

Posted:5 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

You will be joining our client's team as a Site Reliability Engineer, where your main responsibility will be to ensure the reliability and uptime of critical services. Your focus will include Kubernetes administration, CentOS servers, Java application support, incident management, and change management. The ideal candidate for this role will have strong experience with ArgoCD for Kubernetes management, Linux skills, basic scripting knowledge, and familiarity with modern monitoring, alerting, and automation tools. We are looking for a self-motivated individual with excellent communication skills, both oral and written, who can work effectively both independently and collaboratively. Your responsibilities will include monitoring, maintaining, and managing applications on CentOS servers to ensure high availability and performance. You will be conducting routine tasks for system and application maintenance and following SOPs to correct or prevent issues. Responding to and managing running incidents, including post-mortem meetings, root cause analysis, and timely resolution will also be part of your responsibilities. Additionally, you will be monitoring production systems, applications, and overall performance, using tools to detect abnormal behaviors in the software and collecting information to help developers understand the issues. Security checks, running meetings with business partners, writing and maintaining policy and procedure documents, writing scripts or code as necessary, and learning from post-mortems to prevent new incidents are also key aspects of the role. Technical skills required for this position include: - 5+ years of experience in a SaaS and Cloud environment - Administration of Kubernetes clusters, including management of applications using ArgoCD - Linux scripting to automate routine tasks and improve operational efficiency - Experience with database systems like MySQL and DB2 - Experience as a Linux (CentOS / RHEL) administrator - Understanding of change management procedures and enforcement of safe and compliant changes to production environments - Knowledge of on-call responsibilities and maintaining on-call management tools - Experience with managing deployments using Jenkins - Prior experience with monitoring tools like New Relic, Splunk, and Nagios - Experience with log aggregation tools such as Splunk, Loki, or Grafana - Strong scripting knowledge in one of Python, Ruby, Bash, Java, or GoLang - Experience with API programming and integrating tools like Jira, Slack, xMatters, or PagerDuty If you are a dedicated professional who thrives in a high-pressure environment and enjoys working on critical services, this opportunity could be a great fit for you.,

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You