Jobs

Interviews
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home

Jobs

Home
>
Jobs in Chennai
>
Zuora
>
Site Reliability Engineer II

Site Reliability Engineer II

Name: Jobpe
Address: T-Hub, Plot No 1/C, Sy No 83/1, Raidurgam panmaktha, Knowledge City Rd, Hyderabad, Telangana, 500081, IN
Telephone: +91-83339-09630
Price range: Free

Zuora

1 - 6 years

20 - 25 Lacs

Chennai

Posted:5 days ago| Platform:

Apply

Skills Required

Performance tuning Automation Tomcat Linux MySQL Incident management Customer support Oracle Monitoring Python

Work Mode

Work from Office

Job Type

Full Time

Job Description

Join Zuora s high-impact Operations team, where you'll be instrumental in maintaining the reliability, scalability, and performance of our SaaS platform. This role involves proactive service monitoring, incident response, infrastructure service management, and ownership of internal and external shared services to ensure optimal system availability and performance. You will work alongside a team of skilled engineers dedicated to operational excellence through automation, observability, and continuous improvement. In this cross-functional role, you'll collaborate daily with Product Engineering & Management, Customer Support, Deal Desk, Global Services, and Sales teams to ensure a seamless and customer-centric service delivery model. As a core member of the team, you'll have the opportunity to design and implement operational best practices, contribute to service provisioning strategies, and drive innovations that enhance the overall platform experience. If you're driven by solving complex problems in a fast-paced environment and are passionate about operational resilience and service reliability, we d love to hear from you. Our Tech Stack: Linux Administration, Python, Docker, Kubernetes, MySQL, Kafka, ActiveMQ, Tomcat App & Web, Oracle, Load Balancers, REDIS Cache, Debezium, AWS, WAF, LBs, Jenkins, GitOps, Terraform, Ansible, Puppet, Prometheus, Grafana, Open Telemetry In this role you'll get to Architect and implement intelligent automation workflows for infrastructure lifecycle management, including self-healing systems, automated incident remediation, and configuration analomy detection using Infrastructure as Code (IaC) and AI-driven tooling. Leverage predictive monitoring and anomaly detection techniques powe'red by AI/ML to proactively assess system health, optimize performance, and preempt service degradation or outages. Lead complex incident response efforts, applying deep root cause analysis (RCA) and postmortem practices to drive long-term stability, while integrating automated detection and remediation capabilities. Partner with development and platform engineering teams to build resilient CI/CD pipelines, enforce infrastructure standards, and embed observability and reliability into application deployments. Identify and eliminate reliability bottlenecks through automated performance tuning, dynamic scaling policies, and advanced telemetry instrumentation. Maintain and continuously evolve operational runbooks by incorporating machine learning insights, updating playbooks with AI-suggested resolutions, and identifying automation opportunities for manual steps. Stay abreast of emerging trends in AI for IT operations (AIOps), distributed systems, and cloud-native technologies to influence strategic reliability engineering decisions and tool adoption. Who we're looking for Hands-on experience with Linux Servers Administration and Python Programming. Deep experience with containerization and orchestration using Docker and Kubernetes, managing highly available services at scale. Working with messaging systems like Kafka and ActiveMQ, databases like MySQL and Oracle, and caching solutions like REDIS. Understands and applies AI/ML techniques in operations, including anomaly detection, predictive monitoring, and self-healing systems. Has a solid track record in incident management, root cause analysis, and building systems that prevent recurrence through automation. Is proficient in developing and maintaining CI/CD pipelines with a strong emphasis on observability, performance, and reliability. Monitoring and observability using Prometheus, Grafana, and OpenTelemetry, with a focus on real-time anomaly detection and proactive alerting. Is comfortable writing and maintaining runbooks and enjoys enhancing them with automation and machine learning insights. Keeps up-to-date with industry trends such as AIOps, distributed systems, SRE best practices, and emerging cloud technologies. Brings a collaborative mindset, working cross-functionally with engineering, product, and operations teams to align system design with business objectives. 1+ years of experience working in a SaaS environment. Nice to Have: Red Hat Certified System Administrator (RHCSA) - Red Hat AWS Certification Certified Associate in Python Programming (PCAP) - Python Institute Docker Certified Associate (DCA) or Certified Kubernetes Administrator (CKA) Good knowledge of Jenkins Advanced certifications in SRE or related fields As part of our commitment to building an inclusive, high-performance culture where ZEOs feel inspired, connected and valued, we support ZEOs with: Competitive compensation, corporate bonus program, performance rewards and retirement programs Medical insurance Generous, flexible time off Paid holidays, we'llness days and company wide end of year break 6 months fully paid parental leave Learning & Development stipend Opportunities to volunteer and give back, including charitable donation match Free resources and support for your mental we'llbeing

More Jobs at Zuora

Software Engineer - Fullstack

Bengaluru, Karnataka, India

Experience: Not specified

Salary: Not disclosed

Machine Learning III

Chennai, Tamil Nadu, India

3.0 - 3.0 yrs

Salary: Not disclosed

Application Support Engineer

Chennai

4.0 - 5.0 yrs

INR 8 - 9 Lacs

Technical Consultant

Bengaluru

3.0 - 8.0 yrs

INR 11 - 12 Lacs

Technical Consultant

Chennai

3.0 - 8.0 yrs

INR 11 - 12 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Performance Tuning Interview Now

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Zuora

44 Jobs

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

Site Reliability Engineer II