Senior Site Reliability Engineer

5 years

3 - 9 Lacs

Posted:1 day ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

Senior Site Reliability Engineer - JD
As a Senior Site Reliability Engineer (SRE), you will collaborate closely with our Development and IT teams to ensure the reliability, scalability, and performance of our applications. You will take ownership of setting and maintaining service-level objectives (SLOs), building robust monitoring and alerting, and continually improving our infrastructure and processes to maximize up time and deliver exceptional customer experience. This role operates at the intersection of development and operations, reinforcing best practices, automating solutions, and reducing toil across systems and platforms.
About QualMinds:
QualMinds is a global technology company dedicated to empowering clients on their digital transformation journey. We help our clients to design & develop world-class digital products, custom softwares and platforms. Our primary focus is delivering enterprise grade interactive software applications across web, desktop, mobile, and embedded platforms.
Responsibilities:
1. Ensure Reliability & Performance: Own the observability of our systems, ensuring they meet established service-level objectives (SLOs) and maintain high availability.
2. Cloud & Container Orchestration: Deploy, configure, and manage resources on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE), focusing on secure and scalable infrastructures.
3. Infrastructure Automation & Tooling: Set up and maintain automated build and deployment pipelines; drive continuous improvements to reduce manual work and risks.
4. Monitoring & Alerting: Develop and refine comprehensive monitoring solutions (performance, uptime, error rates, etc.) to detect issues early and minimize downtime.
5. Incident Management & Troubleshooting: Participate in on-call rotations; manage incidents through resolution, investigate root causes, and create blameless postmortems to prevent recurrences.
6. Collaboration with Development: Partner with development teams to design and release services that are production-ready from day one, emphasizing reliability, scalability, and performance.
7. Security & Compliance: Integrate security best practices into system design and operations; maintain compliance with SOC 2 and other relevant standards.
8. Performance & Capacity Planning: Continuously assess system performance and capacity; propose and implement improvements to meet current and future demands.
9. Technical Evangelism: Contribute to cultivating a culture of reliability through training, documentation, and mentorship across the organization.

Requirements:
  • Bachelor’s degree in Computer Science, Business Administration, or relevant work experience.
  • A minimum of 5+ years in an SRE, DevOps, or similar role in an IT environment, required.
  • Hands-on experience with Microsoft SQL Clusters, Elasticsearch, Kubernetes, required.
  • Deep familiarity with Windows or Linux environments and .NET or PHP stack applications, including IIS/Apache, SQL Server/MySQL, etc.
  • Strong understanding of networking, firewalls, intrusion detection, and security best practices.
  • Proven administrative experience with tools like GIT, TFS, Bitbucket, and Bamboo for continuous Integration, Delivery, and Deployment.
  • Knowledge of automation testing tools such as SonarQube, Selenium, or comparable technologies.
  • Experience with performance profiling, logging, metrics collection, and alerting tools.
  • Competence in debugging solutions across diverse environments.
  • Hands-on experience with GCP, AWS, or Azure, container orchestration (Kubernetes), and microservices-based architectures.
  • Understanding of authentication, authorization, OAUTH, SAML, encryption (public/private key, symmetric, asymmetric), token validation, and SSO.
  • Familiarity with security strategies to optimize performance while maintaining compliance (e.g., SOC 2).
  • Willingness to participate in an on-call rotation and respond to system emergencies 24/7 when necessary.
  • Monthly weekend rotation for Production Patching.
  • A+, MCP, Dell certifications and Microsoft office expertise are a plus!

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You