Site
Staff Software Engineer
You will help the organization accelerate the SaaS journey by implementing robust CICD automation, strategic solutions and reusable architecture patterns with loosely coupled architecture. Your contribution is required to increase the throughput of the delivery, accelerate the lead time, increase stability and provide highly reliable solutions.
This is a fantastic opportunity to join a highly skilled and talented team where you'll be able to add real value to our organization as a central function. This opportunity helps to upskill the SRE/DevOps in-depth knowledge, upskill your technical and business skill set.
You will be working in a high-tech DevOps CICD pipeline consisting of continuous integration, continuous deployment, ephemeral environments (dynamic environments), automated QA pipelines, fan out pipelines, blue/green deployments and many other cutting edge features to deploy a SaaS product to multiple partner environments from on premise to different cloud providers.
Your Responsibilities
- Design, Test and roll out robust
CI/CD solutions
as a centralized solution
across business domains - Design, test and roll out
infrastructure as code
solutions which can act as the central golden version
which can work for different combinations of input
- Present solutions at review boards, Conduct internal
walkthrough sessions
and conduct handover/training sessions on the target solution and implementation steps - Design robust
CI/CD pipelines
to work with cloud based/on prem kubernetes
clusters. - Setup a
continuous delivery and deployment pipeline
integrated with release workflow
to support release orchestration
- Troubleshoot
multi layer and containerized applications
deployed in cloud infrastructure - Infuse AI in software development lifecycle to increase efficiency
- Maintaining, enhancing and fine tuning dynamic environments
- Apply automation and software to any manual and mechanical tasks or parts of the system that would benefit from it or are performed manually
- Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices.
- Conduct system discovery, analysis, and develop improvements for system software performance, availability and reliability
- Design, write, ship, and motivate the implement solutions to increase observability, product reliability and organizational efficiency
- Propagate Site Reliability Engineering culture across the organization by sharing industry best practices, standards, approaches, documentation, and code with other engineering teams
- Collaborate closely with software engineers and testers to ensure the system is responding properly to no-functional requirements such as performance, security, and availability
- Document system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it
- Maintain and monitoring deployment, orchestration, of the servers, docker containers, kubernetes, and general backend infrastructure
- Keep up-to date with security and proactively identify, diagnose, and solve complex security issues
- Participate in On-Call roster to provide weekend support when required
Required Technical Skills
-
Minimum 9 years of working experience in CICD platform, Kubernetes, leveraging DevOps, SRE & Agile methodologies
-
Minimum 5+ years of experience in designing, setting up and maintaining kubernetes cluster and containerized pipeline
-
Minimum 5+ years experience designing, testing and implementing CICD pipeline to automate build, deployment and code promotion
-
Minimum 5+ years of experience in writing automation scripts, CICD pipeline and automated routine tasks using groovy / python to eliminate human dependencies
-
Prior experience in troubleshooting CICD pipeline issues for containerized and multi layer applications deployed in GCP or AWS
-
Sound knowledge to dive deep to understand the problem statement and execute structured troubleshooting mechanisms to identify the root cause and apply strategic solutions
-
Experience with CI/CD in cloud environments and container technology, Docker and Kubernetes, Docker Swarm, Helm DevOps (Git + CI/CD pipelines)
-
AI infusion experience in software development lifecycle
- Experience as Linux systems administrator (eg Ubuntu, RedHat) and command line system administration such as Bash, VIM, SSH.
- Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools - Grafana/prometheus, DataDog, Nagios, New Relic
- Extended expertise in infrastructure core components: storage, system and/or networking
- Strong understanding of TCP/IP networking, including familiarity with concepts such as OSI stack.
- Strong understanding of Internet protocols and applications such as SMTP, DNS, HTTP, SSH, SNMP etc.
- Solid understanding of ELK, Redis, RabbitMQ, Kafka and ETCD.
- Hands-on experience in writing infrastructure as code (IaC), configuration management as code (CMaC) and policy as code (PoaC) is a plus
Required Business Skills
- Able to design, test and present CI/CD pipelines and automation solutions at Architecture and Security boards
- Adaptable to change and able to work independently with one team attitude
- Ability to communicate clearly and with clarity to different stakeholders
- Strong presentation skills to prepare powerpoint presentations and architecture diagrams
- Capable of delivering multiple initiatives concurrently while maintaining a high level of attention to detail
- Manage and prioritize work effectively with minimal supervision
- Provide timely and relevant stakeholder update, project status and vital data points
- Ability to learn new technologies as needed to provide the best solutions.
- Strong problem analysis skills to dive deep to understand root cause, provide strategic / interim solutions.
- Sound analytical skills to come up with supporting data points
- Solid mathematical skills to enforce programmatic results validation
Certification Requirements (Nice to have)
- Kubernetes CKA or CKAD certification is nice to have
- AWS or GCP DevOps related certifications is nice to have
- GCP or AWS certification on cloud architecture - associate/professional is nice to have