Senior AI Engineer (Site Reliability)

0 years

5 - 6 Lacs

Posted:4 days ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

About SRE Team:

Site Reliability Engineering (SREs) is responsible for keeping all production systems running efficiently including some bug fixing. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational field, and mature automation to our operating environments and the P&G codebase. SREs specializes in systems (operating systems, storage subsystems, networking), while implementing standard processes for availability, reliability, and scalability, with multifaceted interests in algorithms and distributed systems.

In this role, you'll be constantly learning, staying up to date with industry trends and new technologies in data solutions. You'll have the chance to work with a variety of tools and technologies, including big data platforms, machine learning frameworks, and data visualization tools, to build innovative and effective solutions.

So, if you're passionate about the possibilities of data, and eager to make a real impact in the world of business, a career in SRE team might be just what you're looking for. Join us and become a part of the future of digital transformation.

About P&G IT:

Digital is at the core of P&G’s accelerated growth strategy. With this vision, IT in P&G is deeply embedded into every critical process across business organizations comprising 11+ category units globally crafting impactful value through Transformation, Simplification & Innovation. IT in P&G is sub-divided into teams that engage strongly for revolutionizing the business processes to deliver outstanding value & growth - Digital GTM, Digital Manufacturing, Marketing Technologist, Ecommerce, Data Sciences & Analytics, Data Solutions & Engineering, Product Supply.

Responsibilities:

As a Site Reliability Engineer (SRE) at P&G, you will play a crucial role in ensuring the reliability, availability, and performance of our production systems. Your role will blend software engineering principles with operational field to build scalable and highly available systems. You will collaborate with development and operations teams to implement automation, optimize costs, and troubsolve issueshey arise.

  • Oversee and maintain the smooth operation of production systems, ensuring high availability and reliability.

  • Lead post-incident reviews to identify improvements in processes and systems.

  • Develop monitoring and observability dashboards and alerts to provide actionable insights into system health.

  • Design and implement automation solutions for routine operational tasks to improve efficiency and reduce manual intervention.

  • Develop and maintain automatic tests to ensure the quality and reliability of production systems.

  • Analyze system performance and resource utilization to identify opportunities for cost optimization.

  • Work with teams to implement best practices for prioritization and cost-efficient architecture.

  • Participate in the change management process to facilitate flawless production deployments.

  • Plan, execute, and supervise production deployments to ensure minimal downtime and service disruption.

  • Collaborate with other teams to ensure accurate deployment strategies and rollback mechanisms are in place.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You