Key Responsibilities:
- A day in the life of an Infoscion
- As a Senior Site Reliability Engineer you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best
- practices from reliability perspective
- Improve reliability quality and time to market of our suite of products applications
- Define suitable metrics for system with SLO SLI and setup observability mechanism to track it
- Define error budget as per the SLO
- Define strategy and setup up High Availability and Load Balancer based architecture
- Drive a metrics driven culture and software delivery process using data to measure overall system quality and reliability
- Balance feature development speed and reliability with well defined service level objectives
- Provide primary operational support and engineering for products applications
- Partner with solution architect and development teams to improve services reliability
- Participate in system design
- Participate in optimizing code automating operational tasks and toil reduction
- Provide solutions for performance management monitoring and observability
- Work with business users to understand issues develop root cause analysis and work with the development team for enhancements fixes
- Working on distributed traces to visualize the entire workflow and analyze the cause of problems incidents
- Improve security and performance of applications
- Define evangelize and maintain SRE best practices
- Solutionize and implement DevSecOps best practices
- Improve automation including system s self healing capability
- Manage and participate in on call incidents if required Priority Incident
- If you think you fit right in to help our clients navigate their next in their digital transformation journey this is the place for you
Technical Requirements:
- Must have at least 5 years of SRE experience in large programs with focus on release engineering observability tasks and reliability
- Reliability practices
- Chaos engineering
- Strong experience on one or more Observability tools like New Relic AppDynamics Prometheus Dynatrace DataDog Splunk
- Experience in event correlation using observability or other tools like BigPanda
- Experience in Observability Dashboard creation custom metrics Synthetic Monitoring and Real User Monitoring RUM
- Good experience in scripting or development languages including expertise in Python Ruby JSON Java and Node
- JS PHP anyone
- Experience with scripting in PowerShell M and Bash Shell Perl anyone
- Strong knowledge of application design and architecture including microservices architecture
- Experience in CICD tooling and best practices
- Experience of Cloud platforms such as AWS Azure and Google
Additional Responsibilities:
- AIOps and related tools
- Experience in container orchestration and practices including Kubernetes Docker Swarm
- Experience in infrastructure automation tools like Terraform Cloud Formation Ansible and Puppet Any one
- Knowledge on SQL NoSQL Oracle Couchbase
- Experience working on ITSM tools like Remedy ServiceNow Confluence Jira
- Experience with Cloud cost optimization FinOps
Preferred Skills:
Foundational->Configuration Management->Configuration Management->Ansible,Technology->Infra_ToolAdministration-Others->Splunk Admin,Technology->Infra_ToolAdministration-PerformanceManagement->AppDynamics,Technology->Infra_ToolAdministration-PerformanceManagement->Dynatrace