Senior Site Reliability Engineers

0 years

0 Lacs

Posted:4 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

A Site Reliability Engineer (SRE) is an IT expert who ensures the reliability and

efficiency of an IT system's infrastructure and applications. SREs combine software

engineering and systems administration principles to detect issues, automatically

handle failures, prepare for disaster recovery, maintain security, conduct post-incident

reviews, and much more to improve the stability. We are searching for someone who

brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys

collaborating with a cross-functional team to develop real-world solutions and positive

user experiences at every interaction.


Objective of this role

● Understanding and documenting the performance and scalability non-functional

requirements, including SLI/SLOs. Validating requirements with business

stakeholders.

● Manage SLI/SLOs of customer facing interfaces as well as backend services and

provide improvement plans for non-compliance.

● Develop custom dashboards in observability platforms (New

Relic/Dynatrace/Datadog/Splunk/Grafana/Signoz, etc.) to represent a holistic

view of system operational health

● Improve reliability, quality, and time-to-market of our suite of software solutions

● Support release engineering by providing automation support as well as pushing

changes to production when manual intervention needed

● Measure and optimize system performance, with an eye toward pushing our

capabilities forward, getting ahead of customer needs, and innovating to

continually improve

● Provide primary operational support and engineering for multiple large

distributed software applications

Responsibilities

● Gather and analyse metrics from both operating systems and applications to

assist in performance tuning and fault finding

● Partner with engineering teams to improve services through rigorous testing and

release procedures

● Participate in system design consulting, platform management, and capacity

planning

● Modelling areas of risk to estimate latency characteristics and capacity

requirements. Typically, this will either be refining the workload and modelling

how it applies to a set of components, or working with component suppliers to

estimate capacity requirements.

● Create sustainable systems and services through automation and uplifts

● Balance feature development speed and reliability with well-defined service

level objectives.


Requirements and qualifications

● Bachelor’s degree in computer science or other highly technical, scientific

discipline

● Ability to program (structured and object-oriented) with one or more high-level

languages, such as Go, Java, Python, C/C++, Ruby, React JS and JavaScript

● A proactive approach to spotting problems, areas for improvement and

performance bottlenecks

● Ability to drive a collaborative approach across business functions and external

partners

● Strong communication and interpersonal skills

● Reliable high-speed internet connection

● Accountable and has a strong desire to learn and thrive to achieve

Preferred Qualifications

● Previous success in technical engineering

● Coding experience beyond simple scripts

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now

RecommendedJobs for You