Home
Jobs

5 - 10 years

5 - 10 Lacs

Posted:2 days ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

What will you do at Fynd

  • Lead, mentor, and grow a team of 2-5 Site Reliability Engineers.
  • Define, implement, and advocate SRE best practices like SLAs, SLOs, SLIs, error budgets, and chaos engineering.
  • Build and maintain automated CI/CD pipelines and infrastructure using tools like Terraform, Jenkins, or GitHub Actions.
  • Own the observability stackmonitoring, alerting, logging, and tracing across microservices and platforms.
  • Improve reliability and scalability of services by proactively identifying bottlenecks and automating manual ops tasks.
  • Drive incident response practices including on-call rotations, runbooks, and blameless postmortems.
  • Ensure high availability and uptime across distributed systems hosted on AWS.
  • Collaborate with cross-functional teams to ensure the architecture is cloud-native, secure, and fault-tolerant.
  • Implement and optimize systems for cost-efficiency, auto-scaling, and performance.
  • Contribute to open source or write technical blogs to share insights and practices with the broader tech community.
  • This is a startup, so expect rapid changes and plenty of opportunities to take initiative and drive new initiatives.

Some Specific Requirements

  • At least 3+ years of experience leading SRE/DevOps/Infrastructure teams, with 5+ years overall in backend, systems, or infrastructure roles.
  • Strong experience managing distributed systems and microservices at scale.
  • Good understanding of Linux, Networking, Load Balancing, and Security concepts.
  • Hands-on experience with AWS services like EC2, ELB, AutoScaling, CloudFront, S3, CloudWatch.
  • Experience with container technologies and orchestrationDocker and Kubernetes is a must.
  • Strong proficiency with Infrastructure-as-Code tools like Terraform, CloudFormation, or Pulumi.
  • Familiarity with observability tools like Prometheus, Grafana, ELK, or Datadog.
  • Programming/scripting skills in Python, Go, Bash or similar for automation and tooling.
  • Understanding of message queues and event-driven architectures using Kafka or RabbitMQ.
  • Ability to manage incidents, write detailed postmortems, and improve reliability across teams and services.
  • Comfortable working in a fast-paced environment with a strong culture of ownership and continuous improvement.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Fynd
Fynd

E-commerce, Fashion

Mumbai

RecommendedJobs for You

Mumbai, Maharashtra, India

Chennai, Tamil Nadu, India

Bengaluru, Karnataka

Chennai, Tamil Nadu, India

Hyderabad / Secunderabad, Telangana, Telangana, India