Member of Technical Staff (Site Reliability Engineer)

3 - 8 years

11 - 15 Lacs

Posted:2 weeks ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

CockroachDB provides the backbone of storing data on a global scale. As a Site Reliability Engineer you'll help manage and scale our CockroachCloud service. You will oversee our production system, ensuring that we can provide stable and scalable infrastructure as we'deliver CockroachDB to our customers. CockroachCloud is a global service spanning multiple cloud providers. Roughly half of your time will be spent on greenfield development work, with an emphasis on developing tooling and driving automation. In the role you will work across multiple teams within CockroachCloud as we'll as development and product teams working on CockroachDB.

You Will

  • Manage the infrastructure for cloud services, including running internal production systems and hosting CockroachDB for our external customers.
  • Design, write and deliver software and systems to increase product reliability and operational efficiency.
  • Develop custom tools as necessary.
  • Keep a complex system running and solve problems relating to mission-critical services.
  • Design, implement, operate, and troubleshoot the automation and monitoring of production clusters to maximize performance and availability.
  • Drive the company through disaster recovery tests, where we manually turn down pieces of CockroachDB to test its overall resilience to failures.
  • Participate in an on-call rotation for our production systems and hosted services.

The Expectations

In your first 30 days, you will be onboard and exposed to our current internal and customer-facing production systems. Working with our existing SRE and engineering teams, you will pair on production operations and build out runbooks for the operation of different systems. Its essential for you to take this first month to become familiar with our technology and our company.
After 3 months, you'll be fully integrated into the team. You will develop and own tooling for reliability, automation, and other issues related to CockroachCloud s stability and scalability. You will identify new opportunities for automating processes, streamlining delivery, deploying new core functionality, and building great tools. By bringing your expertise to our database, you will help make CockroachCloud the best platform to host CockroachDB.

 

You Have

  • Expertise in analyzing, monitoring, and troubleshooting large-scale distributed systems.
  • 3+ years of experience in software development using one or more of the following: Go, C, C++, Python, Java.
  • Proficiency working with algorithms, data structures, and production troubleshooting.
  • Expertise in working with major cloud providers (AWS, Azure, GCP, etc) and Cloud APIs.
  • Debugged and optimized code and to automate routine tasks.
  • Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)
  • Previous on-call experience, with a sense of urgency.
  • Experience building collaborative relationships with your colleagues. You enjoy being part of the code review process and partnering with your teammates on challenging problems

Benefits

  • Medical Insurance
  • Flexible Time Off
  • Mental we'llbeing Benefits
  • And more!

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Cockroach Labs logo
Cockroach Labs

Software Development

New York NY

RecommendedJobs for You