Jobs
Interviews

37 Pagerduty Jobs - Page 2

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

10.0 - 19.0 years

13 - 22 Lacs

Hyderabad, India

Hybrid

Department: Information Technology Employment Type: Full Time Location: India Description V3locity, Vitech’s cloud-native administration, engagement, and analytics platform, is a transformative suite of complementary applications that offers full life cycle business functionality and robust enterprise capabilities. It marries core administration with superior digital experience and augmented analytics. Its modular design enables flexible, agile deployment strategies. V3locity employs an advanced, cloud-native architecture that leverages the unique capabilities of AWS to deliver a solution with unparalleled security, scalability, and resiliency. Senior Manager– IT Service Management (ITSM) Location: Hyderabad - Hybrid We are seeking a dynamic and experienced IT Service Management (ITSM) leader to lead and enhance our global IT and Cloud operations. The ideal candidate will oversee core ITSM functions, including Service Desk, Incident Management, Problem Management, Change Management, and Service Request Fulfillment in a 24/7, fast-paced software product environment. This leader will play a strategic role in driving continuous improvement, implementing best practices in ITSM, and maturing overall service delivery practices. What you will do: ITSM: Define and drive the ITSM strategy aligned with organizational goals and customer satisfaction. Lead and develop the ITSM function, including Service Desk, Incident, Problem, and Change Management teams based out of our Hyderabad Office. Drive adoption and maturity of ITIL practices across the IT organization. Service Desk Operations: Oversee global service desk operations, ensuring high-quality and timely technical support. Establish and monitor SLAs, KPIs, and customer satisfaction metrics. Ensure timely delivery of customer monthly SLA reporting, leveraging tools like New Relic. Manage on-call rotation for all Service Teams using tools like PagerDuty. Incident & Problem Management: Lead major incident response and communication processes, ensuring minimal impact and quick resolution. Drive root cause analysis, problem identification, and long-term resolution strategies. Maintain high availability and performance of business-critical services. Change & Release Management: Establish and govern change control procedures ensuring safe, secure, and timely releases. Collaborate with DevOps and engineering teams to align change processes with agile product development/deployment/releases. ITSM Tools & Reporting: Own and optimize the ITSM platform (e.g., ServiceNow, Jira Service Management). Own and deliver our monthly client SLA reporting cadence to customers Deliver regular operational reports, dashboards, and executive summaries leveraging Jira Service Management. Identify and implement continuous improvement opportunities based on data insights. Governance & Compliance: Ensure compliance with internal policies, external regulations (e.g., ISO, SOC2), and audit requirements. Maintain clear documentation and process alignment with industry standards (ITIL v4, COBIT). Team Development & Leadership: Lead, mentor, and develop a high-performing team of ITSM professionals. Foster a culture of accountability, collaboration, and service excellence. Manage vendor relationships and third-party service providers as needed. What We're Looking For: 12–15+ years of ITSM experience, with 5+ years in a Service Management role. Proven experience managing global service desk operations and ITIL processes in a product or SaaS environment. ITIL v4 certification; certifications in Agile/Scrum, COBIT, or PMP are a plus. High-level Technical knowledge / certification in AWS Cloud or other clouds. Hands-on experience with ITSM tools like ServiceNow, Jira Service Management, or similar. Working experience with tools in the Monitoring and Service Management space like New Relic, PagerDuty, Honeycomb, Splunk, etc.. Proven experience managing the incident lifecycle, problem, and change processes. Excellent communication, stakeholder management, and crisis management skills. Experience working with global teams across time zones. Prior experience in a software product or SaaS company is highly desirable. Strong business acumen and ability to align IT services with organizational goals. Able to work in shifts and lead the team technically to manage the tasks/issues that arise in the shift. Join Us at Vitech! At Vitech, you’ll be part of a forward-thinking team that values collaboration, innovation, and continuous improvement. We provide a supportive and inclusive environment where you can grow as a leader while helping shape the future of our organization.

Posted 1 month ago

Apply

6.0 - 9.0 years

18 - 20 Lacs

Pune

Work from Office

Notice Period: (Immediate Joiner - Only) Duration: 6 Months (Possible Extension) Shift Timing: 11:30 AM 9:30 PM IST About the Role We are looking for a highly skilled and experienced DevOps / Site Reliability Engineer to join on a contract basis. The ideal candidate will be hands-on with Kubernetes (preferably GKE), Infrastructure as Code (Terraform/Helm), and cloud-based deployment pipelines. This role demands deep system understanding, proactive monitoring, and infrastructure optimization skills. Key Responsibilities: Design and implement resilient deployment strategies (Blue-Green, Canary, GitOps). Configure and maintain observability tools (logs, metrics, traces, alerts). Optimize backend service performance through code and infra reviews (Node.js, Django, Go, Java). Tune and troubleshoot GKE workloads, HPA configs, ingress setups, and node pools. Build and manage Terraform modules for infrastructure (VPC, CloudSQL, Pub/Sub, Secrets). Lead or participate in incident response and root cause analysis using logs, traces, and dashboards. Reduce configuration drift and standardize secrets, tagging, and infra consistency across environments. Collaborate with engineering teams to enhance CI/CD pipelines and rollout practices. Required Skills & Experience: 5-10 years in DevOps, SRE, Platform, or Backend Infrastructure roles. Strong coding/scripting skills and ability to review production-grade backend code. Hands-on experience with Kubernetes in production, preferably on GKE. Proficient in Terraform, Helm, GitHub Actions, and GitOps tools (ArgoCD or Flux). Deep knowledge of Cloud architecture (IAM, VPCs, Workload Identity, CloudSQL, Secret Management). Systems thinking understands failure domains, cascading issues, timeout limits, and recovery strategies. Strong communication and documentation skills capable of driving improvements through PRs and design reviews. Tech Stack & Tools Cloud & Orchestration: GKE, Kubernetes IaC & CI/CD: Terraform, Helm, GitHub Actions, ArgoCD/Flux Monitoring & Alerting: Datadog, PagerDuty Databases & Networking: CloudSQL, Cloudflare Security & Access Control: Secret Management, IAM Driving Results: A good single contributor and a good team player. Flexible attitude towards work, as per the needs. Proactively identify & communicate issues and risks. Other Personal Characteristics: Dynamic, engaging, self-reliant developer. Ability to deal with ambiguity. Manage a collaborative and analytical approach. Self-confident and humble. Open to continuous learning Intelligent, rigorous thinker who can operate successfully amongst bright people

Posted 1 month ago

Apply

5.0 - 10.0 years

19 - 22 Lacs

Pune

Work from Office

Job Description We are looking for an ambitious and highly skilled Go Developer who is passionate about building high-performance, scalable backend systems. This role is perfect for someone who thrives on solving complex engineering challenges, enjoys working with modern development practices, and takes ownership of delivering impactful solutions. You will be part of a dynamic team where innovation, collaboration, and continuous improvement are not just encouraged they are expected. If you were eager to make a meaningful contribution to real-world systems in a fast-paced environment. Skill / Qualifications Bachelor's degree in Computer Science, Engineering, or related technical field 5+ years of hands-on backend development experience Strong programming expertise in Golang Hands-on experience with MongoDB, OracleDB, and Snowflake Proficiency in using Logstash, Elasticsearch, and Splunk (Queries, Alerts, Dashboards) Experience in writing and maintaining scripts for automation and monitoring Familiarity with containerization and orchestration using Docker and Kubernetes Proficient in using Kafka for messaging and stream processing Comfortable working with GitLab for version control and CI/CD pipelines Experience handling incident alerts and escalations via PagerDuty Job Responsibilities Participate in daily stand-ups, code reviews, and sprint planning Review code and tickets to ensure high-quality development practices Design technical specifications for databases and APIs Plan and execute production deployments reliably and efficiently Provide Level 2 on-call support via PagerDuty for escalated incidents Collaborate with cross-functional teams including QA, DevOps, and product stakeholders Ensure effective incident response and root cause analysis for production issues Benefits Competitive Market Rate (Depending on Experience)

Posted 1 month ago

Apply

5.0 - 8.0 years

7 - 12 Lacs

Hyderabad

Work from Office

Job Description The role of the Lead Site Reliability Engineer is to be hands-on and provide mentorship to other team members on core SRE principles and tools. The lead SRE will participate in end to end operational aspects of Production environment. The individual concerned will be able to work on cloud systems, networks, databases and help drive incident lifecycle management. As a member of the SRE team, you will also be working closely with the Architects, DevOps, Product and development teams to ensure we get the most out of the software on AWS platform. This role requires a highly skilled technology professional with excellent communication skills, strategic mindset, strong analytical and troubleshooting skills on AWS Cloud Platform. Other responsibilities include working with internal business partners to gather requirements, prototyping, architecting, implementing/updating solutions, building and executing test plans, performing quality reviews, managing operations, and triaging and fixing operational issues. Site Reliability Engineers must be able to adjust to constant business change; common types of changes include new requirements, evolving goals and strategies, and emerging technologies. About the Role: Be hands-on and provide mentorship to a growing SRE team on core SRE principles and tools. Foster a sense of automation in issue resolution; everything possible should be automated, and only when automation cant resolve an issue should people get involved in the resolution Lead efforts for updating production with new versions/infrastructures as they are available Lead capacity planning efforts in collaboration with Architects and DevOps engineers to determine changes to infrastructure that are needed to support new load and performance characteristics Leads engagement with software developers, DevOps and other infrastructure engineers to integrate software development and delivery from inception to full operation, ensuring robust released software and systems. Ensure highest level of uptime to meet the customer SLA by implementing system wide corrections to prevent reoccurrence of issues. Mentor other SRE team members to further develop their soft and hard skills Triage, troubleshoot and resolve issues using golden signals and go past golden signals Go past golden signals with additional principles such as chaos engineering to detect failure points and lead Game days for testing resiliency of team when it comes to incident response and remediations and synthetic monitoring. Lead SRE team members to create and maintain Recovery Procedures, RCAs in collaboration with other engineering teams. Ensure Incidents assigned to the team are being managed within agreed SLAs Ensure alarms are documented in up to date Knowledge Base Articles. Ensures Production infrastructure is up to date with server/security patches and certificates. Continuous improvement of system and application monitoring and automation Identify and automate manual workarounds and process improvements Proactive monitoring of Monitor the availability, latency, scalability and efficiency of all services Perform periodic on-call duty as part of the SRE team About You: Skilled with cloud operations/administration in Amazon AWS. Tax/Accounting domain experience Bachelors or Masters in Computer Science discipline. 5+ years experience focussed on Site Reliability Engineering or related position in AWS Cloud Platform. At least 2 AWS Certifications are must. (AWS Sysops Admin and Architects certifications preferred). Experience working with SQL, Windows Servers, Load balancers, Linux Deep experience with AWS, Docker and Kubernetes, CloudFormation, CloudWatch, CodeDeploy, DynamoDB, Lambda, SQS, Amazon FSX, Elastic Search and networking concepts are must. Program at a high level in at least one language such as: Java, C#, Javascript, Python or Ruby. Integration experience with PagerDuty, ServiceNow, Datadog, CloudWatch. Good understanding of Site Reliability Engineering (SRE) philosophies, technologies, platforms and tools, SLO management, incident resolution, and automation; Ability to explain technical concepts in clear, non-technical language Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks) Knowledge of security and compliance standards such as SOC/PCI is a plus

Posted 1 month ago

Apply

3.0 - 6.0 years

11 - 15 Lacs

Bengaluru

Work from Office

Associate Lead - Kubernetes Platform Is your passion for Cloud Native Platform That is, envisioning and building the core services that underpin all Thomson Reuters’ products Then we want you on our India-based team ! This role is in the Platform Engineering organization where we build the foundational services that power Thomson Reuters’ products. We focus on the subset of capabilities that help Thomson Reuters deliver digital products to our customers . Our mission is to build a durable competitive advantage for TR by providing “building blocks” that get value-to-market faster. About the Role This role is within Platform Engineering’s Service Mesh team, a dedicated group which engineers and operates our Service Mesh capability, which is a microservice platform based on Kubernetes and Istio. Primarily work with AWS and Azure public cloud, especially Kubernetes (AWS EKS and Azure AKS), Service Mesh technology like Istio, Terraform, Datadog, PagerDuty and Python, Golang, Java and/or .Net Core Programming- Golang, Other - Java, C# & Primary Skill Golang, Kubernates Work closely with an architect, establish and entrench the architectural design & principles for Service Mesh Participate in all aspects of the development lifecycleIdeation, Design, Build, Test and Operate . We embrace a DevOps culture (“you build it, you run it”); while we have dedicated 24x7 level-1 support engineers, you may be called on to assist with level-2 support About You 6+ years software development experience 2+ years of experience building cloud native infrastructure, applications and services on AWS, Azure or GCP Hands-on experience with Kubernetes , ideally AWS EKS and/or Azure AKS Experience with Istio or other Service Mesh technologies Experience with container security and supply chain security Experience with declarative infrastructure-as-code, CI/CD automation and GitOps Experience with Kubernetes operators written in Golang A bachelors degree in computer science , Computer Engineering or similar #LI-AD2 What’s in it For You Hybrid Work Model We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected. Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance. Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future. Industry Competitive Benefits We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing. Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our valuesObsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together. Social Impact Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives. Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world. About Us Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news. We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound excitingJoin us and help shape the industries that move society forward. As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace. We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here. Learn more on how to protect yourself from fraudulent job postings here. More information about Thomson Reuters can be found on thomsonreuters.com.

Posted 1 month ago

Apply

2.0 - 5.0 years

2 - 6 Lacs

Coimbatore

Work from Office

The Opportunity: Avantor is looking for a dynamic, forward-thinking, and experienced Engineer - Command Center, who will be responsible for delivering results against some of the most complex business and technology initiatives. This role will be a full-time position based out of IND- Coimbatore. If you are passionate about solving complex challenges and driving innovation lets talk! Our organization is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. JOB DESCRIPTION: As a member of IT Service Management monitoring team, reporting to the Senior Manager of IT Services, you will be responsible to monitor servers, networks, databases, storage and backup devices for proactive identification of incidents. In this well-respected IT group, you will enjoy a wide variety of self-directed work within a supportive team environment. MAJOR JOB DUTIES AND RESPONSIBILITIES (List in order of importance) Monitor event alerts, acknowledge and, when appropriate, escalate to the next level support team(s). Perform in-depth monitoring for P1 and P2 critical applications and basic monitoring for P3, P4 applications. Notify Outage Management Team as the first point of contact for critical P1 and P2 alerts to ensure timely escalation and resolution. Schedule jobs in SAP tool for different systems, ensure successful runs and restart when required. Cleanup NAS backup server files. Prepare weekly error report and ensure tickets are created for all failed jobs. Prepare weekly & monthly Task performance/ Aging reports, drive aging calls with wider team and ensure tickets are closed on time/record justification if required. Support IT changes, prioritizing change requests, assessing impact, and accepting changes which meet requirements. Maintain internal knowledge repository. Manage ticketed query system and ensure queries and resolutions are tracked and kept up to date. QUALIFICATIONS (Education/Training, Experience and Certifications) Bachelors degree or equivalent experience within an enterprise level corporate IT environment is required. Experience in IT monitoring is highly desirable. Direct experience with Jenkins, Nprinting, Cloudwatch, Qlikview, SolarWinds, Redwood, OpManager and/or PagerDuty is highly desirable. Certifications in AWS or ITIL is a plus. KNOWLEDGE SKILLS AND ABILITIES (Those necessary to perform the job competently) Knowledge of ITIL based Incident, Problem and Change Management processes. Strong problem solving and analytical skills. Ability to self-start and to effectively participate in a team environment. Ability to be an on-call escalation point for production support and scheduled off-hours/weekend work if/when required. Ability to focus on the customer and to adhere to processes defined for customer issue handling. Ability to examine, summarize, and effectively present data when required. Commitment to high professional and ethical standards in a diverse workplace.

Posted 2 months ago

Apply

4 - 9 years

10 - 14 Lacs

Hyderabad

Work from Office

Senior Manager Information Systems – Observability Operations What you will do Let’s do this. Let’s change the world. In this vital role you will responsible for leading and overseeing the day-to-day operations of the organization's global observability service. This position should be able to Implement and maintain observability standard methodologies, including tagging, metrics, and logging to provide comprehensive access to system performance. Use tools like Dynatrace, PagerDuty, and other solutions to monitor the health and performance of infrastructure and applications in real-time. The ideal candidate will have a consistent record of leadership in technology-driven on-prem and cloud environments and has a passion for fostering innovation and excellence in the biotechnology industry. Work closely with multi-functional teams including product managers, Application owners, and Infrastructure engineers to define requirements and implement monitoring solutions. This role demands the ability to drive and deliver against key organizational critical initiatives, develop a collaborative environment, and deliver high-quality results in a matrixed organizational structure. Please note, this is an onsite role based in Hyderabad. Roles & Responsibilities: Lead and develop a successful team of Monitoring engineers through recruitment, performance management, and career development Establish and maintain operational metrics, SLAs, and performance standards Experience with observability tools and monitoring large ecosystems. Monitor and manage global Observability infrastructure. Promote automation technologies and self-healing capabilities. Lead incident response and problem management for critical observability issues Oversee implementation and maintenance of security policies and patching and agent upgrade procedures Ensure compliance with regulatory and security requirements. Generate regular reports on license usage, agent upgrades and incident/problem creations. Deliver continuous improvement initiatives in observability operations. Optimize resource allocation and shift coverage for 24/7 operations. Partner with business collaborators to understand and support organizational needs. Lead incident response and problem management for critical issues. Ensure compliance with regulatory requirements. What we expect of you We are all different, yet we all use our unique contributions to serve patients. Basic Qualifications: Master’s degree and 8 to 10 years of experience in Observability operation, with at least 3 years in management OR Bachelor’s degree and 10 to 14 years of experience in Observability Operations, with at least 4 years in management OR Diploma and 14 to 18 years of experience in Observability Operations, with at least 5 years in management Deep understanding of monitoring and notification technologies, observability concepts using Dynatrace and Pagerduty Knowledge of Infrastructure and Application monitoring Knowledge of Logs/Traces Solid background in open telemetry and integration Knowledge of AWS and Azure services Knowledge of TypeScript, React and Python scripting Knowledge of container and K8 environment Preferred Qualifications: Experience in a leadership role within a pharmaceutical or technology organization Strong analytic/critical-thinking and decision-making abilities. Experience with cloud platforms (AWS, Azure, or Google Cloud) Knowledge of automation tools like Ansible and Terraform Understanding of Agile practices Ability to work effectively in a fast-paced, dynamic environment. Professional Certifications Management certifications (Scrum/Agile) (preferred) Associate or Specialist Certification from Dynatrace Soft Skills: Excellent leadership and team management skills. Strong transformation and organizational change experience. Exceptional collaboration and communication skills. High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented with a focus on achieving team goals. Strong presentation and public speaking skills. Excellent analytical and fix skills Strong verbal and written communication skills Ability to work optimally with global, virtual teams Shift Information: This position is an onsite role and may require working during later hours to align with business hours. Candidates must be willing and able to work outside of standard hours as required to meet business needs. What you can expect of us As we work to develop treatments that take care of others, we also work to care for your professional and personal growth and well-being. From our competitive benefits to our collaborative culture, we’ll support your journey every step of the way. In addition to the base salary, Amgen offers competitive and comprehensive Total Rewards Plans that are aligned with local industry standards. Apply now for a career that defies imagination Objects in your future are closer than they appear. Join us. careers.amgen.com As an organization dedicated to improving the quality of life for people around the world, Amgen fosters an inclusive environment of diverse, ethical, committed and highly accomplished people who respect each other and live the Amgen values to continue advancing science to serve patients. Together, we compete in the fight against serious disease. Amgen is an Equal Opportunity employer and will consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or any other basis protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Posted 2 months ago

Apply

2 - 4 years

8 - 12 Lacs

Bengaluru

Work from Office

locationsIndia, Bangalore time typeFull time posted onPosted 2 Days Ago job requisition idJR0035199 Job Title: Site Reliability Engineer About Trellix: Trellix, the trusted CISO ally, is redefining the future of cybersecurity and soulful work. Our comprehensive, GenAI-powered platform helps organizations confronted by todays most advanced threats gain confidence in the protection and resilience of their operations. Along with an extensive partner ecosystem, we accelerate technology innovation through artificial intelligence, automation, and analytics to empower over 53,000 customers with responsibly architected security solutions. We also recognize the importance of closing the 4-million-person cybersecurity talent gap. We aim to create a home for anyone seeking a meaningful future in cybersecurity and look for candidates across industries to join us in soulful work. More at . Role Overview: The Site Reliability Engineer team is responsible for design, implementation and end to end ownership of the infrastructure platform and services that protect the Trellix Securitys Consumer. The services provide continuous protection to our customers with a very strong focus on quality and an extendible services platform to internal partners & product teams. This role is a Site Reliability Engineer for commercial cloud-native solutions, deployed and managed in public cloud environments like AWS, GCP. You will be part of a team that is responsible for Trellix Cloud Services that enable protection at the endpoint products on a continuous basis. Responsibilities of this role include supporting Cloud service measurement, monitoring, and reporting, deployments and security. You will input into improving overall operational quality through common practices and by working with the Engineering, QA, and product DevOps teams. You will also be responsible for supporting efforts that improve Operational Excellence and Availability of Trellix Production environments. You will have access to the latest tools and technology, and an incredible career path with the worlds cyber security leader. You will have the opportunity to immerse yourself within complex and demanding deployment architectures and see the big picture all while helping to drive continuous improvement in all aspects of a dynamic and high-performing engineering organization. If you are passionate about running and continuously improving as a world class Site Reliability Engineer Team, we are offering you a unique and great opportunity to build your career with us and gain experience working with high-performance Cloud systems. About Role: Being part of a global 24x7x365 team providing the operational coverage including event response and recovery efforts of critical services. Periodic deployment of features, patches and hotfixes to maintain the Security posture of our Cloud Services. Ability to work in shifts on a rotational basis and participate in On-Call duties Have ownership and responsibility for high availability of Production environments Input into the monitoring of systems applications and supporting data Report on system uptime and availability Collaborate with other team members on best practices Assist with creating and updating runbooks & SOPs Build a strong relationship with the Cloud DevOps, Dev & QA teams and become a domain expert for the cloud services in your remit. Provided the required support for growth and development in this role. About you: 2 to 4 years of hands-on working experience in supporting production of large-scale cloud services. Strong production support background and experience of in-depth troubleshooting Experience working with solutions in both Linux and Windows environments Experience using modern Monitoring and Alerting tools (Prometheus, Grafana, PagerDuty, etc.) Excellent written and verbal communication skills. Experience with Python or other scripting languages Proven ability to work independently in deploying, testing, and troubleshooting systems. Experience supporting high availability systems and scalable solutions hosted on AWS or GCP. Familiarity with security tools & practices (Wiz, Tenable) Familiarity with Containerization and associated management tools (Docker, Kubernetes) Significant experience of developing and maintaining relationships with a wide range of customers at all levels Understanding of Incident, Change, Problem and Vulnerability Management processes. Desired: Awareness of ITIL best practices AWS Certification and/or Kubernetes Certification Experience with SnowFlake Automation/CI/CD experience, Jenkins, Ansible, Github Actions, Argo CD. Company Benefits and Perks: We believe that the best solutions are developed by teams who embrace each other's unique experiences, skills, and abilities. We work hard to create a dynamic workforce where we encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees. Retirement Plans Medical, Dental and Vision Coverage Paid Time Off Paid Parental Leave Support for Community Involvement We're serious ab out our commitment to a workplace where everyone can thrive and contribute to our industry-leading products and customer support, which is why we prohibit discrimination and harassment based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.

Posted 2 months ago

Apply

1 - 6 years

8 - 13 Lacs

Pune

Work from Office

Cloud Observability Administrator JOB_DESCRIPTION.SHARE.HTML CAROUSEL_PARAGRAPH JOB_DESCRIPTION.SHARE.HTML Pune, India India Enterprise IT - 22685 about our diversity, equity, and inclusion efforts and the networks ZS supports to assist our ZSers in cultivating community spaces, obtaining the resources they need to thrive, and sharing the messages they are passionate about. Cloud Observability Administrator ZS is looking for a Cloud Observability Administrator to join our team in Pune. As a Cloud Observability Administrator, you will be working on configuration of various Observability tools and create solutions to address business problems across multiple client engagements. You will leverage information from requirements-gathering phase and utilize past experience to design a flexible and scalable solution; Collaborate with other team members (involved in the requirements gathering, testing, roll-out and operations phases) to ensure seamless transitions. What Youll Do: Deploying, managing, and operating scalable, highly available, and fault tolerant Splunk architecture. Onboarding various kinds of log sources like Windows/Linux/Firewalls/Network into Splunk. Developing alerts, dashboards and reports in Splunk. Writing complex SPL queries. Managing and administering a distributed Splunk architecture. Very good knowledge on configuration files used in Splunk for data ingestion and field extraction. Perform regular upgrades of Splunk and relevant Apps/add-ons. Possess a comprehensive understanding of AWS infrastructure, including EC2, EKS, VPC, CloudTrail, Lambda etc. Automation of manual tasks using Shell/PowerShell scripting. Knowledge of Python scripting is a plus. Good knowledge of Linux commands to manage administration of servers. What Youll Bring: 1+ years of experience in Splunk Development & Administration, Bachelor's Degree in CS, EE, or related discipline Strong analytic, problem solving, and programming ability 1-1.5 years of relevant consulting-industry experience working on medium-large scale technology solution delivery engagements; Strong verbal, written and team presentation communication skills Strong verbal and written communication skills with ability to articulate results and issues to internal and client teams Proven ability to work creatively and analytically in a problem-solving environment Ability to work within a virtual global team environment and contribute to the overall timely delivery of multiple projects Knowledge on Observability tools such as Cribl, Datadog, Pagerduty is a plus. Knowledge on AWS Prometheus and Grafana is a plus. Knowledge on APM concepts is a plus. Knowledge on Linux/Python scripting is a plus. Splunk Certification is a plus. Perks & Benefits ZS offers a comprehensive total rewards package including health and well-being, financial planning, annual leave, personal growth and professional development. Our robust skills development programs, multiple career progression options and internal mobility paths and collaborative culture empowers you to thrive as an individual and global team member. We are committed to giving our employees a flexible and connected way of working. A flexible and connected ZS allows us to combine work from home and on-site presence at clients/ZS offices for the majority of our week. The magic of ZS culture and innovation thrives in both planned and spontaneous face-to-face connections. Travel Travel is a requirement at ZS for client facing ZSers; business needs of your project and client are the priority. While some projects may be local, all client-facing ZSers should be prepared to travel as needed. Travel provides opportunities to strengthen client relationships, gain diverse experiences, and enhance professional growth by working in different environments and cultures. Considering applying? At ZS, we're building a diverse and inclusive company where people bring their passions to inspire life-changing impact and deliver better outcomes for all. We are most interested in finding the best candidate for the job and recognize the value that candidates with all backgrounds, including non-traditional ones, bring. If you are interested in joining us, we encourage you to apply even if you don't meet 100% of the requirements listed above. ZS is an equal opportunity employer and is committed to providing equal employment and advancement opportunities without regard to any class protected by applicable law. To Complete Your Application Candidates must possess or be able to obtain work authorization for their intended country of employment.An on-line application, including a full set of transcripts (official or unofficial), is required to be considered. NO AGENCY CALLS, PLEASE. Find Out More At

Posted 2 months ago

Apply

6 - 10 years

8 - 12 Lacs

Noida

Work from Office

Job Description Job Description We are looking for a highly skilled and experienced Senior DevOps Engineer to join our team. The ideal candidate will have 5-7 years of experience in a DevOps role and a proven track record of implementing and maintaining complex systems with a focus on automation, scalability, and security. The Senior DevOps Engineer will work closely with our development, operations, and security teams to ensure that our software is released quickly and reliably, with a focus on continuous integration and delivery. Requirements: Bachelors/Masters degree in Computer Science, Information Technology or related field 5-7 years of experience in a DevOps role Strong understanding of the SDLC and experience with working on fully Agile teams Proven experience in coding & scripting DevOps, Ant/Maven, Groovy, Terraform, Shell Scripting, and Helm Chart skills. Working experience with IaC tools like Terraform, CloudFormation, or ARM templates Strong experience with cloud computing platforms (e.g. Oracle Cloud (OCI), AWS, Azure, Google Cloud) Experience with containerization technologies (e.g. Docker, Kubernetes/EKS/AKS) Experience with continuous integration and delivery tools (e.g. Jenkins, GitLab CI/CD) Kubernetes - Experience with managing Kubernetes clusters and using kubectl for managing helm chart deployments, ingress services, and troubleshooting pods. OS Services Basic Knowledge to Manage, configuring, and troubleshooting Linux operating system issues (Linux), storage (block and object), networking (VPCs, proxies, and CDNs) Monitoring and instrumentation - Implement metrics in Prometheus, Grafana, Elastic, log management and related systems, and Slack/PagerDuty/Sentry integrations Strong know-how of modern distributed version control systems (e.g. Git, GitHub, GitLab etc) Strong troubleshooting and problem-solving skills, and ability to work well under pressure Excellent communication and collaboration skills, and ability to lead and mentor junior team members Career Level - IC3 Responsibilities Responsibilities Design, implement, and maintain automated build, deployment, and testing systems Experience in Taking Application Code and Third Party Products and Building Fully Automated Pipelines for Java Applications to Build, Test and Deploy Complex Systems for delivery in Cloud. Ability to Containerize an Application i.e. creating Docker Containers and Pushing them to an Artifact Repository for deployment on containerization solutions with OKE (Oracle container Engine for Kubernetes) using Helm Charts. Lead efforts to optimize the build and deployment processes for high-volume, high-availability systems Monitor production systems to ensure high availability and performance, and proactively identify and resolve issues Support and Troubleshoot Cloud Deployment and Environment Issues Create and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI/CD Continuously improve the scalability and security of our systems, and lead efforts to implement best practices Participate in the design and implementation of new features and applications, and provide guidance on best practices for deployment and operations Work with security team to ensure compliance with industry and company standards, and implement security measures to protect against threats Keep up-to-date with emerging trends and technologies in DevOps, and make recommendations for improvement Lead and mentor junior DevOps engineers and collaborate with cross-functional teams to ensure successful delivery of projects Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programming and performs testing and debugging of applications. As a member of the software engineering division, you will analyze and integrate external customer specifications. Specify, design and implement modest changes to existing software architecture. Build new products and development tools. Build and execute unit tests and unit test plans. Review integration and regression test plans created by QA. Communicate with QA and porting engineering to discuss major changes to functionality. Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 6+ years of software engineering or related experience.

Posted 2 months ago

Apply

5 - 10 years

30 - 35 Lacs

Hyderabad

Remote

Role : Devops Engineer Company : Feuji Software Solutions Pvt Ltd. Mode of Hire : Permanent Position Experience : 6- 12 Years Work Location : Hyderabad/ Remote About Feuji Feuji, established in 2014 and headquartered in Dallas, Texas, has rapidly emerged as a leading global technology services provider. With strategic locations including a Near Shore facility in San Jose, Costa Rica, and Offshore Delivery Centers in Hyderabad, and Bangalore, we are well-positioned to cater to a diverse clientele. Our team of 600 talented engineers drives our success, delivering innovative solutions to our clients and contributing to our recognition as a 'Best Place to Work For.' We collaborate with a wide range of clients, from startups to industry giants in sectors like Healthcare, Education, IT, and engineering, enabling transformative changes in their operations. Through partnerships with top technology providers such as AWS, Checkpoint, Gurukul, CoreStack, Splunk, and Micro Focus, we empower our clients' growth and innovation. With a clientele including Microsoft, HP, GSK, and DXC Technologies, we specialize in managed cloud services, cybersecurity, Product and Quality Engineering Services, and Data and Insights solutions, tailored to drive tangible business outcomes. Our commitment to creating 'Happy Teams' underscores our values and dedication to positive impact. Feuji welcomes exceptional talent to join our team, offering a platform for growth, development, and a culture of innovation and excellence. Key Responsibilities Design and implement continuous integration and continuous deployment frameworks from code to deploy Manage and optimize data pipelines for performance, scalability, and reliability Develop, implement, and maintain scalable data pipelines and processes Create and manage automated provisioning and configuration systems for data infrastructure using infrastructure-as-code principles Design, implement, configure and manage system monitoring solutions that alert teams to problems before customers are impacted Support developers in code deployment and troubleshooting Work closely with customers and other team members to understand complex requirements and translate them into automated solutions Provide support to ensure mission critical applications and components are being monitored and meet security, reporting and retention requirements as well as disaster recovery requirements of clients Support team members Skills Knowledge & Expertise Programming/Development Skills : Strong experience in Python is essential and experience with React/Vue.js would be preferred. Monitoring Tools : Familiarity with tools such as PagerDuty, Azure Monitor, and Datadog is beneficial, though monitoring is not the primary focus of the role. Good understanding of any of these tools will be advantageous. Cloud & DevOps Expertise : Must have a strong background in CI/CD, specifically with GitHub Actions. Deep expertise in Azure is essential. Experience with AWS or GCP is a plus. Should demonstrate the ability to quickly adapt to and learn new technologies. Soft Skills & Mindset : A strong passion for continuous learning and self-improvement. Excellent client-facing skills, with the confidence to handle discussions intelligently and effectively. Must be proactive, take full ownership of tasks, and be capable of delivering results even in challenging situations. Required Qualifications : 7+ years of DevOps experience 5+ years of Azure experience 2+ years of Development experience 2+ years of Terraform experience Cloud certifications Excellent communication skills Strong multi-tasker Self-starter Team player Preferred Qualifications : Kubernetes experience Azure, AWS and GCP Professional level certifications Kubernetes certifications (CKA, CKAD, CKS)

Posted 2 months ago

Apply

5 - 7 years

20 - 27 Lacs

Pune

Hybrid

Role: AppOps engineer Location: Pune, Hinjewadi Hybrid (3 days a week) Exp - 5 - 7 years Responsibilities: • Designing and implementing infrastructure and systems (such as metrics, monitoring, node management, alerting, deployment, logging) • Setup new environments & deploying solutions • Application migration from EC2 to containers • Building proactive Monitoring & alerting service. • Automation using ansible, python, Perl scripting • Performance and stability problems investigation - internal and on client sites • Tuning Actimize Platform(AIS and RCM)/Operating System/Application servers/Databases for optimal performance and stability • Identifying performance bottlenecks and assisting in root cause analysis. • Performance related design reviews • Create and setup deployment scripts for different environments (i.e. Test properties vs Prod properties) • Configure and optimize instances and web servers for optimal performance. (ex: adjusting default connection limits, adjusting request queuing thresholds) • AWS troubleshooting support • Support, Architect and Implement alongside Technical & Operations teams to meet our customers' individual needs for their infrastructure & application deployments. • Work on critical, highly complex customer problems that will span multiple AWS services (dealing daily with high severity incidents). • Help build and improve customer operations through scripts to automate and deploy AWS resources seamlessly with as little manual intervention as possible. • Collaborate and help build utilities and tools for internal use that enable you and your fellow AWS Engineers to operate safely at high speed / wide scale. • Drive customer communication during critical events. • Flexible to work over the weekends and in shift environment ( as per • Good experience in a DevOps environment / Operations team / Infrastructure Operations team. • Excellent Troubleshooting skills • Expertise in Performance tuning / investigation / root cause analysis / mitigate bottlenecks • Excellent hands-on experience in managing Application Support (3 tier/2 tier apps) • AWS service knowledge for core services (EC2, S3, IAM, ASG, ELB, CFN, VPC, DX, VPN, ) • Good exposure on managing Containers & Kubernetes, deployment and configuration on containers • Good hands-on experience in deployment, release management, migration activities • Exposure to scripting language (Ansible, Perl, Python, Ruby, Shell script, PowerShell etc.) • Database skills ( SQL ,Oracle or Postgres / Cassandra ) • Good exposure on ELK, Splunk, Kafka • Application Server (skills on any of Middleware technologies e.g. • Tomcat, WebLogic , WebSphere) • Good exposure on Application performance monitoring tools like • AppDynamics, Dynatrace • Strong problem solving, analytical and communication skills • Good communication both written and verbal • Troubleshooting performance issues & tuning • Working with Architecture team on hardware sizing recommendations • JAVA performance testing, diagnosis, and tuning JAVA applications Additional Skills Desired: • Cloud / Application level Security experience • Has worked in an Agile / Sprint development model. • Experience in working with tools like OpsGenie, AlertOps, Pagerduty/OpenDuty • Troubleshooting Java related issues • performance testing/investigation experience • Database performance testing, diagnosis, and tuning. please drop mail with your details and resume to chaithra.j@xoriant.com to proceed further.

Posted 2 months ago

Apply
Page 2 of 2
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies