Home
Jobs

SRE-Production Support Engineering Manager

0 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Pls See Below

  • We need a strong profile having good exp in stakeholder & SRE team management.
  • Good understanding of Production engineering/ production support projects is a must which includes handling teams working in 24/7 model.
  • Good understanding of Incident, change, service req management is a daily routine - so candidate should know how to manage the workload, rotate FTEs as and when required.
  • Management of Ad hoc activities such as Vulnerabilities fixes/ patching awareness is required.
  • Should be able to lead BAU governance activities Daily, Weekly & Monthly cadence with necessary reporting data.
  • Having GCP cloud infra management knowledge, Postgres DB basic knowledge & banking domain experience is a big advantage to the role.
==================================================================================================
  • Mandatory experience on SRE (not Traditional Production Support) covering integration platforms on cloud-based deployments.
  • Knowledge of applying SRE practices to daily operations is key.
  • Ability to manage teams in shifts from office is mandatory; this is a 24x7 on desk operation.
  • Computer Science and/or Engineering degrees are preferred.
  • Having domain experience in Banking will be a great advantage.

Working Experience/ Awareness

  • 24x7 operations support model for mission critical applications and infrastructure using ServiceNow as the ITSM ticketing tool.
  • GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc.
  • Working knowledge on AppDynamics and Splunk for monitoring and setting up observability is key. CI/CD tool chains, setting up and running deployment pipelines and propagating changes on different environments. Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat).
  • Maintain Hazelcast Data storage platform clusters and Control M job schedulers.
  • Kubernetes cluster management, monitoring, and remediation. Knowledge of Docker is important.
  • Automating deployments and scripting self-healing workflows based on telemetry.
  • Work closely with the team to define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability.
  • Work closely with the team to understand the code as well as configuration artifacts to debug and fix issues that may arise.
  • Must be inclined to work on proof of concepts solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging.
  • Able to lead & drive SRE team to parallelly work on Service or Change Requests, Defect management board, backlog management in agile manner.

Good To Have

  • SRE Foundation certification by DevOps Institute or any other equivalent certification on SRE by a recognized body is mandatory.
  • CKA certification.
  • GCP Cloud Digital Leader certification at a minimum is mandatory; Cloud Engineer level is a bonus.
  • Hazelcast Platform Operations certification badge

Mock Interview

Practice Video Interview with JobPe AI

Start Support Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Virtusa
Virtusa

Information Technology and Services

Southborough

20,000+ Employees

4423 Jobs

    Key People

  • Kris Canekeratne

    Chairman and CEO
  • Sanjay Singh

    President and COO

RecommendedJobs for You