Job
Description
Educational
Bachelor of Engineering,BTech,Bachelor Of Science,Master Of Engineering,Master Of Technology Service Line
Infosys Cobalt Unit Responsibilities
A day in the life of an Infoscion
As a Senior Site Reliability Engineer, you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best practices from reliability perspective. Improve reliability, quality, and time-to-market of our suite of products/applications.Define suitable metrics for system with SLO/SLI and setup observability mechanism to track itDefine error budget as per the SLODefine strategy and setup up High Availability and Load Balancer based architecture Drive a metrics-driven culture and software delivery process using data to measure overall system quality and reliability.Balance feature development speed and reliability with well-defined service level objectivesProvide primary operational support and engineering for products/applicationsPartner with solution architect and development teams to improve services reliability Participate in system designParticipate in optimizing code, automating operational tasks and toil reductionProvide solutions for performance management, monitoring and observability Work with business users to understand issues, develop root cause analysis and work with the development team for enhancements/fixesWorking on distributed traces to visualize the entire workflow and analyze the cause of problems/incidentsImprove security and performance of applications Define, evangelize, and maintain SRE best practices Solutionize and implement DevSecOps best practices Improve automation including system’s self-healing capability Manage and participate in on-call incidents, if required (Priority Incident) If you think you fit right in to help our clients navigate their next in their digital transformation journey, this is the place for you!
Additional Responsibilities:
AIOps and related toolsExperience in container orchestration and practices, including Kubernetes, Docker Swarm Experience in infrastructure automation tools like Terraform, Cloud Formation, Ansible, and Puppet (Any one)Knowledge on SQL, NoSQL (Oracle, Couchbase)Experience working on ITSM tools like Remedy, ServiceNow, Confluence, JiraExperience with Cloud cost optimization / FinOps
Technical and Professional :
Must have at least 5+ years of SRE experience in large programs with focus on release engineering, observability tasks and reliabilityReliability practicesChaos engineeringStrong experience on one or more Observability tools like New Relic, AppDynamics, Prometheus, Dynatrace, DataDog, Splunk, Experience in event correlation using observability or other tools like BigPandaExperience in Observability Dashboard creation, custom metrics, Synthetic Monitoring and Real User Monitoring (RUM)Good experience in scripting or development languages, including expertise in Python, Ruby, JSON, Java, and Node.JS, PHP (anyone)Experience with scripting in PowerShell(M) and Bash/Shell/Perl (anyone)Strong knowledge of application design and architecture including microservices architecture Experience in CICD tooling and best practices Experience of Cloud platforms such as AWS, Azure, and Google
Preferred
Skills:
Foundational-Configuration Management-Configuration Management-Ansible Technology-Infra_ToolAdministration-Others-Splunk Admin Technology-Infra_ToolAdministration-PerformanceManagement-AppDynamics Technology-Infra_ToolAdministration-PerformanceManagement-Dynatrace Generic
Skills:
Technology-Infra_ToolAdministration-ITSM-ServiceNow Technology-OpenSystem-Python - OpenSystem-Python