Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home

Jobs

Home
>
Jobs in Chennai
>
Veryon
>
Site Reliability Engineer – Technical Lead

Site Reliability Engineer – Technical Lead

Veryon

6 years

0 Lacs

Chennai Tamil Nadu India

Posted:1 week ago| Platform:

Apply

Skills Required

reliability software technology efficiency maintenance engineering devops support datadog aws architecture deployment onboarding monitoring service coordination drive automate gitlab automation rollback risk controls metrics resolve stack analysis documentation remediation design strategy iam strategies jenkins logging troubleshooting scripting code python terraform orchestration docker communication empathy collaboration

Work Mode

On-site

Job Type

Full Time

Job Description

Why We Need You – The Mission & Our Vision

Veryon is a leading software and technology company that enables aviation teams around the world to improve efficiency and safety. Our products maximize uptime for aircraft maintenance teams through customer-driven innovation and world-class service.

Fueled by Customers

As a hands-on Technical Lead in Site Reliability Engineering, you will be directly responsible for designing, building, and implementing modern reliability practices to ensure uptime, resilience, and production excellence across Veryon’s systems. You’ll work closely with Engineering, DevOps, and Support teams to streamline software delivery to both internal and client environments, troubleshoot production issues, and build observability using Datadog, Dynatrace, and AWS-native tools. You will also be a mentor on best practices and a key contributor to reliability-focused architecture and deployment design.

What You’ll Accomplish – Your Performance Objectives

Objective #1 – First 30 Days

Complete onboarding and gain deep understanding of Veryon’s systems, release processes, and deployment environment on AWS.
Review existing application architecture, CI/CD flows, and monitoring implementations.
Begin implementing improvements to observability using Datadog and Dynatrace.
Collaborate with engineers and DevOps to identify bottlenecks in production releases and issue resolution.

Objective #2 – First 90 Days

Build or enhance monitoring dashboards and alerts for critical infrastructure and applications.
Define and begin implementing Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
Own and improve release workflows and ensure reliable software delivery to customer environments.
Take ownership of investigating production issues, ensuring timely resolution and coordination across teams.
Begin documenting Root Cause Analyses (RCAs) for production incidents and drive preventive improvements.
Partner with DevOps to optimize and automate CI/CD pipelines using GitLab or equivalent.

Objective #3 – First 12 Months

Deliver measurable improvements in system uptime, MTTR, and deployment success rate.
Build self-healing automation and rollback mechanisms for high-risk services.
Standardize and own the RCA process for production incidents to ensure continuous learning.
Implement robust controls and metrics to monitor software delivery health.
Support production readiness of new services through performance baselining and fault testing.
Establish and track health KPIs that inform operational decisions and product improvements.

Key Job Responsibilities

Implement and manage observability, alerting, and dashboards using Datadog, Dynatrace, and AWS tools.
Take ownership of production deployments, ensuring successful delivery to client environments with minimal disruption.
Troubleshoot and resolve production issues across the stack (infrastructure, application, integration).
Lead Root Cause Analysis (RCA) documentation, follow-ups, and remediation planning.
Define and maintain service SLOs, SLIs, and error budgets with product and engineering teams.
Build automation for deployment, monitoring, incident response, and recovery.
Design CI/CD workflows that support safe and reliable delivery across distributed environments.
Partner with developers to ensure observability and reliability are part of the application design.
Mentor engineers in SRE principles, monitoring strategy, and scalable operations.

Experience and Skills We Seek

6+ years of experience in SRE, DevOps, or platform engineering roles.
Strong hands-on experience with AWS services
(e.g., EC2, ECS/EKS, RDS, IAM, CloudWatch, Route 53, ELB, etc.) is
required
.
Deep familiarity with CI/CD pipelines and deployment strategies using GitLab CI, Jenkins, or equivalent.
Expertise in observability tools such as
Datadog
and
Dynatrace
for APM, logging, and alerting.
Solid experience troubleshooting distributed systems in production environments.
Proficiency in scripting and infrastructure as code (e.g., Python, Bash, Terraform, Ansible).
Working knowledge of containers and orchestration (Docker, Kubernetes).
Understanding of SRE principles (SLIs, SLOs, MTTR, incident response, etc.).
Excellent communication and documentation skills, especially for RCA and runbook creation.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

How We Work – The Core Values That We Live By

Fueled By Customers

Win Together

Make It Happen

Innovate to Elevate

More Jobs at Veryon

Software Development Manager – Mobile & Client

Chennai, Tamil Nadu, India

5 - 8 yrs

Salary: Not disclosed

Site Reliability Engineer – Technical Lead

Chennai, Tamil Nadu, India

6.0 - 6.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Reliability Interview Now

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Veryon

2 Jobs

RecommendedJobs for You

Site Reliability Engineer – Technical Lead

Veryon

Chennai, Tamil Nadu, India

Site Reliability Engineer – Technical Lead

Veryon

Chennai, Tamil Nadu, India

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

Site Reliability Engineer – Technical Lead