8.0 - 13.0 years

10 - 15 Lacs

Bengaluru

Work from Office

What you will do: Design, implement, and maintain observability solutions (logging, monitoring, tracing) for cloud-native applications and infrastructure. Develop and optimize diagnostics tooling to quickly identify and resolve system or application-level issues. Monitor cloud infrastructure to ensure uptime, performance, and scalability, responding promptly to incidents and outages. Collaborate with development, operations, and support teams to drive improvements in system observability and troubleshooting workflows. Lead root cause analysis for major incidents, driving long-term fixes to prevent recurrence. Resolve customer-facing operational issues in a timely and effective manner. Automate operational processes and incident response tasks to reduce manual interventions and improve efficiency. Continuously assess and improve cloud observability tools, integrating new features and technologies where vital. Create and maintain comprehensive documentation on cloud observability frameworks, tools, and processes Who you will work with As a member of the Site Reliability Engineering (SRE) team, you will collaborate with a diverse group of professionals across various functions and regions. You will work closely with: Software Engineering Teams: Partner with developers to ensure that new features and services are reliable, scalable, and observable from the outset. You'll participate in design reviews and contribute to the overall architecture to enhance system performance and reliability. Coordinate with SRE team to automate deployment processes, manage infrastructure as code, and ensure seamless deployment pipelines. Product Management: Engage with product managers to understand customer requirements and ensure that reliability and performance are integral parts of product roadmaps. DevOps and Infrastructure Teams: Customer Support: Collaborate with customer support teams to diagnose and resolve incidents, providing insights and tools that enable faster troubleshooting and improved user experiences. Security and Compliance Teams: Work alongside security experts to maintain compliance with industry standards, ensuring that all systems and processes adhere to security best practices. Global Network Operations Teams: Interact with global operations staff spread across India, Europe, Canada, and the USA to support 24/7 service reliability and incident response. Data Analytics and Reporting: Team up with data analysts to create meaningful dashboards and reports that provide insights Who you are: Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience. 8+ years of experience in cloud engineering, (SRE), or DevOps. Expertise with cloud platforms (AWS, Azure, GCP) and related monitoring/observability tools (e.g., Prometheus, Grafana, Datadog, ELK Stack). Strong experience with diagnostics and troubleshooting tools for cloud services. Proficient in scripting languages (Python, Bash, etc.) and infrastructure-as-code (Terraform, CloudFormation). Experience in operational incident management, including root cause analysis and post-mortem reviews. Solid understanding of containerization (Docker, Kubernetes) and microservices architecture. Knowledge of network performance monitoring and debugging techniques. Desire to solve complex problems Proactive in communicating and handling stakeholders remotely and in various time-zones Demonstrated ability to collaborate with Engineering teams.

Posted 1 month ago

Apply

Senior Site Reliability Engineer I, Efficiency and Performance Cisco

0.0 - 1.0 years

1 - 2 Lacs

Bengaluru

Work from Office

About The Role The Site Reliability Engineering team focused on Efficiency and Performance is responsible for driving AWS cost intelligence, managing the ThousandEyes infrastructure, and ensuring optimal resource utilization and performance. In this role, the Senior Site Reliability Engineer will play a crucial part in optimizing the tools, services, and infrastructure that support the ThousandEyes platform. What You'll Do The Site Reliability Engineering team dedicated to Efficiency and Performance is responsible for optimizing AWS cost intelligence, managing the ThousandEyes infrastructure, and ensuring resource intelligence and performance. By strategically managing cloud resources and infrastructure, this team enhances the overall performance and reliability of our services. The Senior Site Reliability Engineer in this role will lead efforts to optimize cloud expenditures, streamline infrastructure management, and ensure that all resources are utilized efficiently, driving continuous improvement in service reliability and performance. Think self service and participate in and contribute to improve our "Follow the sun" model incident response and on-call rotation. Qualifications Ability to design and implement scalable and well tested solutions, with focus on streamlining operations. Strong hands on experience in cloud preferably AWS, Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes. Previous experience in AWS cost management Understanding of Prometheus and its ecosystem, including Alertmanager. Ability to write high quality code in Python, Go, or equivalent languages Good understanding of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols.

Posted 1 month ago

Apply

MLOps/LLMOps Manager Crisil

7.0 - 10.0 years

11 - 16 Lacs

Mumbai, Hyderabad, Pune

Work from Office

Key Responsibilities: Design, build, and maintain CI/CD pipelines for ML model training, validation, and deployment Automate and optimize ML workflows, including data ingestion, feature engineering, model training, and monitoring Deploy, monitor, and manage LLMs and other ML models in production (on-premises and/or cloud) Implement model versioning, reproducibility, and governance best practices Collaborate with data scientists, ML engineers, and software engineers to streamline end-to-end ML lifecycle Ensure security, compliance, and scalability of ML/LLM infrastructure Troubleshoot and resolve issues related to ML model deployment and serving Evaluate and integrate new MLOps/LLMOps tools and technologies Mentor junior engineers and contribute to best practices documentation Required Skills & Qualifications: 8+ years of experience in DevOps, with at least 3 years in MLOps/LLMOps Strong experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker) Proficient in CI/CD tools (Jenkins, GitHub Actions, GitLab CI, etc.) Hands-on experience deploying and managing different types of AI models (e.g., OpenAI, HuggingFace, custom models) to be used for developing solutions. Experience with model serving tools such as TGI, vLLM, BentoML, etc. Solid scripting and programming skills (Python, Bash, etc.) Familiarity with monitoring/logging tools (Prometheus, Grafana, ELK stack) Strong understanding of security and compliance in ML environments Preferred Skills: Knowledge of model explainability, drift detection, and model monitoring Familiarity with data engineering tools (Spark, Kafka, etc. Knowledge of data privacy, security, and compliance in AI systems. Strong communication skills to effectively collaborate with various stakeholders Critical thinking and problem-solving skills are essential Proven ability to lead and manage projects with cross-functional teams

Posted 1 month ago

Apply

MLOps/LLMOps Engineer Crisil

7.0 - 10.0 years

8 - 13 Lacs

Mumbai, Hyderabad, Pune

Work from Office

Key Responsibilities: Design, build, and maintain CI/CD pipelines for ML model training, validation, and deployment Automate and optimize ML workflows, including data ingestion, feature engineering, model training, and monitoring Deploy, monitor, and manage LLMs and other ML models in production (on-premises and/or cloud) Implement model versioning, reproducibility, and governance best practices Collaborate with data scientists, ML engineers, and software engineers to streamline end-to-end ML lifecycle Ensure security, compliance, and scalability of ML/LLM infrastructure Troubleshoot and resolve issues related to ML model deployment and serving Evaluate and integrate new MLOps/LLMOps tools and technologies Mentor junior engineers and contribute to best practices documentation Required Skills & Qualifications: 8+ years of experience in DevOps, with at least 3 years in MLOps/LLMOps Strong experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker) Proficient in CI/CD tools (Jenkins, GitHub Actions, GitLab CI, etc.) Hands-on experience deploying and managing different types of AI models (e.g., OpenAI, HuggingFace, custom models) to be used for developing solutions. Experience with model serving tools such as TGI, vLLM, BentoML, etc. Solid scripting and programming skills (Python, Bash, etc.) Familiarity with monitoring/logging tools (Prometheus, Grafana, ELK stack) Strong understanding of security and compliance in ML environments Preferred Skills: Knowledge of model explainability, drift detection, and model monitoring Familiarity with data engineering tools (Spark, Kafka, etc. Knowledge of data privacy, security, and compliance in AI systems. Strong communication skills to effectively collaborate with various stakeholders Critical thinking and problem-solving skills are essential Proven ability to lead and manage projects with cross-functional teams

Posted 1 month ago

Apply

Erlang Developer - Elixir/Distributed Systems Rosemallow Technologies

2.0 - 7.0 years

8 - 14 Lacs

Pune, Coimbatore

Work from Office

Job Summary : We are seeking a skilled Erlang Developer to join our backend engineering team. The ideal candidate will have a strong background in Erlang, with working experience in Elixir and RabbitMQ. You will play a key role in designing, building, and maintaining scalable, fault-tolerant systems used in high-availability environments. Key Responsibilities : - Design, develop, test, and maintain scalable Erlang-based backend applications. - Collaborate with cross-functional teams to understand requirements and deliver efficient solutions. - Integrate messaging systems such as RabbitMQ to ensure smooth communication between services. - Write reusable, testable, and efficient code in Erlang and Elixir. - Monitor system performance and troubleshoot issues in production. - Ensure high availability and responsiveness of services. - Participate in code reviews and contribute to best practices in functional programming. Required Skills : - Proficiency in Erlang with hands-on development experience. - Working knowledge of Elixir and the Phoenix framework. - Strong experience with RabbitMQ and messaging systems. - Good understanding of distributed systems and concurrency. - Experience with version control systems like Git. - Familiarity with CI/CD pipelines and containerization (Docker is a plus). Preferred Qualifications : - Experience working in telecom, fintech, or real-time systems. - Knowledge of OTP (Open Telecom Platform) and BEAM VM internals. - Familiarity with monitoring tools like Prometheus, Grafana, etc.

Posted 1 month ago

Apply

Site Reliability Engineer Talentxo

1.0 - 3.0 years

10 - 15 Lacs

Bengaluru

Work from Office

SRE 1 (Clouds Op) Locations: B'lore & Pune Exp - 1 to 3 yrs Candiates only from B2C product companies Exp - GCP, Prometheus, Grafana, ELK, Newrelic, Pingdom, or Pagerduty , Kubernets Experience with CI/CD tools 5 days week Rotational Shift

Posted 1 month ago

Apply

Devops AWS DATA Engineeer|| Technical Analyst || 12Lakhs CTC Vcloud Technologies Investment

8.0 - 9.0 years

11 - 12 Lacs

Hyderabad

Work from Office

We are seeking a highly skilled Devops Engineer to join our dynamic development team. In this role, you will be responsible for designing, developing, and maintaining both frontend and backend components of our applications using Devops and associated technologies. You will collaborate with cross-functional teams to deliver robust, scalable, and high-performing software solutions that meet our business needs. The ideal candidate will have a strong background in devops, experience with modern frontend frameworks, and a passion for full-stack development. Requirements : Bachelor's degree in Computer Science Engineering, or a related field. 8 to 9+ years of experience in full-stack development, with a strong focus on DevOps. DevOps with AWS Data Engineer - Roles & Responsibilities: Use AWS services like EC2, VPC, S3, IAM, RDS, and Route 53. Automate infrastructure using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation . Build and maintain CI/CD pipelines using tools AWS CodePipeline, Jenkins,GitLab CI/CD. Cross-Functional Collaboration Automate build, test, and deployment processes for Java applications. Use Ansible , Chef , or AWS Systems Manager for managing configurations across environments. Containerize Java apps using Docker . Deploy and manage containers using Amazon ECS , EKS (Kubernetes) , or Fargate . Monitoring & Logging using Amazon CloudWatch,Prometheus + Grafana,E Stack (Elasticsearch, Logstash, Kibana),AWS X-Ray for distributed tracing manage access with IAM roles/policies . Use AWS Secrets Manager / Parameter Store for managing credentials. Enforce security best practices , encryption, and audits. Automate backups for databases and services using AWS Backup , RDS Snapshots , and S3 lifecycle rules . Implement Disaster Recovery (DR) strategies. Work closely with development teams to integrate DevOps practices. Document pipelines, architecture, and troubleshooting runbooks. Monitor and optimize AWS resource usage. Use AWS Cost Explorer , Budgets , and Savings Plans . Must-Have Skills: Experience working on Linux-based infrastructure. Excellent understanding of Ruby, Python, Perl, and Java . Configuration and managing databases such as MySQL, Mongo. Excellent troubleshooting. Selecting and deploying appropriate CI/CD tools Working knowledge of various tools, open-source technologies, and cloud services. Awareness of critical concepts in DevOps and Agile principles. Managing stakeholders and external interfaces. Setting up tools and required infrastructure. Defining and setting development, testing, release, update, and support processes for DevOps operation. Have the technical skills to review, verify, and validate the software code developed in the project. Interview Mode : F2F for who are residing in Hyderabad / Zoom for other states Location : 43/A, MLA Colony,Road no 12, Banjara Hills, 500034 Time : 2 - 4pm

Posted 1 month ago

Apply

Senior Automation Engineer, Cloud Services NICE

4.0 - 7.0 years

11 - 16 Lacs

Pune

Hybrid

So, what’s the role all about? As a Sr. Cloud Services Automation Engineer, you will be responsible for designing, developing, and maintaining robust end-to-end automation solutions that support our customer onboarding processes from an on-prem software solution to Azure SAAS platform and streamline cloud operations. You will work closely with Professional Services, Cloud Operations, and Engineering teams to implement tools and frameworks that ensure seamless deployment, monitoring, and self-healing of applications running in Azure. How will you make an impact? Design and develop automated workflows that orchestrate complex processes across multiple systems, databases, endpoints, and storage solutions in on-prem and public cloud. Design, develop, and maintain internal tools/utilities using C#, PowerShell, Python, Bash to automate and optimize cloud onboarding workflows. Create integrations with REST APIs and other services to ingest and process external/internal data. Query and analyze data from various sources such as, SQL databases, Elastic Search indices and Log files (structured and unstructured) Develop utilities to visualize, summarize, or otherwise make data actionable for Professional Services and QA engineers. Work closely with test, ingestion, and configuration teams to understand bottlenecks and build self-healing mechanisms for high availability and performance. Build automated data pipelines with data consistency and reconciliation checks using tools like PowerBI/Grafana for collecting metrics from multiple endpoints and generating centralized and actionable dashboards. Automate resource provisioning across Azure services including AKS, Web Apps, and storage solutions Experience in building Infrastructure-as-code (IaC) solutions using tools like Terraform, Bicep, or ARM templates Develop end-to-end workflow automation in customer onboarding journey that spans from Day 1 to Day 2 with minimal manual intervention Have you got what it takes? Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience). Proficiency in scripting and programming languages (e.g., C#, .NET, PowerShell, Python, Bash). Experience working with and integrating REST APIs Experience with IaC and configuration management tools (e.g., Terraform, Ansible) Familiarity with monitoring and logging solutions (e.g., Azure Monitor, Log Analytics, Prometheus, Grafana). Familiarity with modern version control systems (e.g., GitHub). Excellent problem-solving skills and attention to detail. Ability to work with development and operations teams, to achieve desired results, on common projects Strategic thinker and capable of learning new technologies quickly Good communication with peers, subordinates and managers You will have an advantage if you also have: Experience with AKS infrastructure administration. Experience orchestrating automation with Azure Automation tools like Logic Apps. Experience working in a secure, compliance driven environment (e.g. CJIS/PCI/SOX/ISO) Certifications in vendor or industry specific technologies. What’s in it for you? Join an ever-growing, market disrupting, global company where the teams – comprised of the best of the best – work in a fast-paced, collaborative, and creative environment! As the market leader, every day at NICE is a chance to learn and grow, and there are endless internal career opportunities across multiple roles, disciplines, domains, and locations. If you are passionate, innovative, and excited to constantly raise the bar, you may just be our next NICEr! Enjoy NICE-FLEX! At NICE, we work according to the NICE-FLEX hybrid model, which enables maximum flexibility: 2 days working from the office and 3 days of remote work, each week. Naturally, office days focus on face-to-face meetings, where teamwork and collaborative thinking generate innovation, new ideas, and a vibrant, interactive atmosphere. Requisition ID: 7454 Reporting into: Director of Cloud Services Role Type: Individual Contributor

Posted 1 month ago

Apply

Sr. Quality Automation Engineer BMC Software

5.0 - 10.0 years

7 - 12 Lacs

Pune

Hybrid

BMC is looking for a Senior QA Engineer to join a QE team working on complex and distributed software, developing test plans, executing tests, developing automation & assuring product quality. Here is how, through this exciting role, YOU will contribute to BMC's and your own success: 1. Define and execute comprehensive test strategies for service management platforms and observability pipelines. 2. Develop, maintain, and optimize automated tests covering incident, problem, change management workflows, and observability data (metrics, logs, traces, events). 3. Collaborate with product, engineering, and SRE teams to embed quality throughout service delivery and monitoring processes. 4. Validate the accuracy, completeness, and reliability of telemetry data and alerts used in observability. 5. Drive continuous integration of quality checks into CI/CD pipelines for rapid feedback and deployment confidence. 6. Investigate production incidents using observability tools and testing outputs to support root cause analysis. 7. Mentor and guide junior engineers on quality best practices for service management and observability domains. 8. Generate detailed quality metrics and reports to inform leadership and drive continuous improvement. To ensure youre set up for success, you will bring the following skillset & experience: 1. 5+ years of experience in quality engineering or software testing with a focus on service management and observability. 2. Strong programming and scripting skills (Java, Python, JavaScript, or similar). 3. Hands-on experience with service management tools such as BMC Helix, ServiceNow, Jira Service Management. 4. Proficient in observability platforms and frameworks (Prometheus, Grafana, ELK Stack, OpenTelemetry, Jaeger). 5. Solid understanding of CI/CD processes and tools (Jenkins, GitHub Actions, Azure DevOps). 6. Experience with cloud environments (AWS, Azure, GCP) and container technologies (Docker, Kubernetes). Whilst these are nice to have, our team can help you develop in the following skills: 1. Experience in Site Reliability Engineering (SRE) practices. 2. Knowledge of security and performance testing methodologies. 3. QA certifications such as ISTQB or equivalent.

Posted 1 month ago

Apply

Full stack Devolper Prodapt Solutions

3.0 - 7.0 years

5 - 9 Lacs

Chennai

Work from Office

Overview We are looking for a Full-stack Developer and Automation Engineer with knowledge in Cloud, DevOps Tools, Automation, excellent analytical, problem solving and communication skills. You'll need to have Bachelor’s degree or two or more years of work experience Experience working with Front-end and Back-end Technologies for building, enhancing and managing applications Experience working with Backend technologies like Python, DJango, Java, ReactJS, NodeJS, Springboot Experience working with Client-side scripting technologies like JavaScript, JQuery, etc. Experience in advanced SQL/procedures on MySQL/MongoDB/MariaDB/Oracle Experience using AWS Cloud Infrastructure services such as EC2, ALB, RDS, etc. Experience working with serverless technologies like AWS Lambda, Google/Azure Functions Knowledge of SDLC with Devops tools and Agile Development Even Better if you have Experience in Monitoring/Alerting tools and platforms such as Prometheus, Grafana, Catchpoint, New Relic, etc Experience agile practices and tools used in the development (Jira, Confluence, Jenkins, etc.) Experience in code review, quality, performance tuning with problem solving and debugging skills. Experience with Unit testing framework like JUnit, Mokito. Good communication, interpersonal skills and communication skills to clearly articulate and influence stakeholders. Very good problem solving skills. Responsibilities We are looking for a Full-stack Developer and Automation Engineer with knowledge in Cloud, DevOps Tools, Automation, excellent analytical, problem solving and communication skills. You'll need to have Bachelor’s degree or two or more years of work experience Experience working with Front-end and Back-end Technologies for building, enhancing and managing applications Experience working with Backend technologies like Python, DJango, Java, ReactJS, NodeJS, Springboot Experience working with Client-side scripting technologies like JavaScript, JQuery, etc. Experience in advanced SQL/procedures on MySQL/MongoDB/MariaDB/Oracle Experience using AWS Cloud Infrastructure services such as EC2, ALB, RDS, etc. Experience working with serverless technologies like AWS Lambda, Google/Azure Functions Knowledge of SDLC with Devops tools and Agile Development Even Better if you have Experience in Monitoring/Alerting tools and platforms such as Prometheus, Grafana, Catchpoint, New Relic, etc Experience agile practices and tools used in the development (Jira, Confluence, Jenkins, etc.) Experience in code review, quality, performance tuning with problem solving and debugging skills. Experience with Unit testing framework like JUnit, Mokito. Good communication, interpersonal skills and communication skills to clearly articulate and influence stakeholders. Very good problem solving skills. We are looking for a Full-stack Developer and Automation Engineer with knowledge in Cloud, DevOps Tools, Automation, excellent analytical, problem solving and communication skills. You'll need to have Bachelor’s degree or two or more years of work experience Experience working with Front-end and Back-end Technologies for building, enhancing and managing applications Experience working with Backend technologies like Python, DJango, Java, ReactJS, NodeJS, Springboot Experience working with Client-side scripting technologies like JavaScript, JQuery, etc. Experience in advanced SQL/procedures on MySQL/MongoDB/MariaDB/Oracle Experience using AWS Cloud Infrastructure services such as EC2, ALB, RDS, etc. Experience working with serverless technologies like AWS Lambda, Google/Azure Functions Knowledge of SDLC with Devops tools and Agile Development Even Better if you have Experience in Monitoring/Alerting tools and platforms such as Prometheus, Grafana, Catchpoint, New Relic, etc Experience agile practices and tools used in the development (Jira, Confluence, Jenkins, etc.) Experience in code review, quality, performance tuning with problem solving and debugging skills. Experience with Unit testing framework like JUnit, Mokito. Good communication, interpersonal skills and communication skills to clearly articulate and influence stakeholders. Very good problem solving skills.

Posted 1 month ago

Apply

Sr. Java Engineer Mastek

10.0 - 12.0 years

30 - 37 Lacs

Bengaluru

Work from Office

We need immediate joiners or those who are serving notice period and can join in another 10-15 days. No other candidate i.e. who are on bench or official 3, 2 months NP. Strong working experience in design and development of RESTful APIs using Java, Spring Boot and Spring Cloud. Technical hands-on experience to support development, automated testing, infrastructure and operations Fluency with relational databases or alternatively NoSQL databases Excellent pull request review skills and attention to detail Experience with streaming platforms (real-time data at massive scale like Confluent Kafka). Working experience in AWS services like EC2, ECS, RDS, S3 etc. Understanding of DevOps as well as experience with CI/CD pipelines Industry experience in Retail domain is a plus. Exposure to Agile Methodology and project tools: Jira, Confluence, SharePoint. Working knowledge in Docker Container/Kubernetes Excellent team player, ability to work independently and as part of a team Experience in mentoring junior developers and providing technical leadership Familiarity with Monitoring & Reporting tools (Prometheus, Grafana, PagerDuty etc). Ability to learn, understand, and work quickly with new emerging technologies, methodologies, and solutions in the Cloud/IT technology space Knowledge of front-end framework using React or Angular and any other programming languages like JavaScript/TypeScript or Python is a plus

Posted 1 month ago

Apply

Site & Platform Reliability Engineer Tetrahed

8.0 - 13.0 years

15 - 30 Lacs

Noida, Greater Noida

Work from Office

Site & Platform Reliability Engineer Location: Noida/Greater Noida Organization: TetrahedInc. Experience: 8+ Years Work Mode: [Onsite] Employment Type: Fulltime About TetrahedInc. TetrahedInc. is a privately held IT services and consulting firm headquartered in Hyderabad with a strong global staffing presence ambition box. They specialize in end-to-end digital transformation offering cloud computing, AI/ML, cybersecurity, data analytics, and recruitment/staffing solutions to diverse industries worldwide tetrahed.com. About the Role As a Site & Platform Reliability Engineer at TetrahedInc., you'll be responsible for designing, automating, and operating cloud-native platforms using SRE/PRE best practices. You'll be a technical leader, engaging with clients, mentoring teams, and collaborating with major cloud and open-source ecosystems (e.g., Kubernetes, CNCF). Key Responsibilities Technical & Architectural Leadership: Lead PoCs, architecture design, SRE kick start, observability, and platform modernization efforts. Engineer scalable, resilient cloud-native systems. Partner with cloud providers like Google, AWS, Microsoft, Red Hat, and VMware. Service Delivery & Automation: Implement SRE principles, automation, infrastructure-as-code (Terraform, Ansible), and CI/CD pipelines (ArgoCD, Jenkins, Tekton). Define SLOs/SLIs, perform incident management, and ensure reliability. Coach internal and client delivery teams in reliability practices. Innovation & Thought Leadership: Contribute to open-source communities or internal knowledge-sharing. Author whitepapers, blogs, or speak at industry events. Maintain hands-on technical excellence and mentor peers. Client Engagement & Trust: Conduct workshops, briefings, and strategic discussions with stakeholders. Act as a trusted advisor during modernization journeys. Mandatory Skills & Experience Proficiency in Kubernetes (Open Shift, Tanzu, or vanilla). Strong SRE knowledge, infrastructure-as-code, and automation scripting (Python, Bash, YAML). Experience with CI/CD pipeline tools (ArgoCD, Jenkins, Tekton). Deep observability experience (Prometheus, ELK/EFK, Grafana, App Dynamics, Dyna-trace). Familiarity with cloud-native networking (DNS, load balancers, reverse proxies). Expertise in micro services and container-based architectures. Excellent communication and stakeholder management. Preferred Qualifications Bachelors/Masters in Computer Science or Engineering. CKA certification or equivalent Kubernetes expertise. 8+years in SI, consulting, or enterprise organizations. Familiarity with Agile/Scrum/Domain-driven design, CNCF ecosystem. Passionate about innovation, labs environment, and open-source. Why Join Tetrahed? Engage with global clients and cloud hyperscalers. Drive opensource and SRE best practices. Contribute to a learning-rich, collaborative environment. Make an impact within a growing, innovative mid-size IT organization. Interested Candidates Lets Connect! Please share your updated CV or reach out directly: Email : manojkumar@tetrahed.com Mobile : +91-6309124068 LinkedIn (Manoj Kumar) : https://www.linkedin.com/in/manoj-kumar-54455024b/ Company Page : https://www.linkedin.com/company/tetrahedinc/

Posted 1 month ago

Apply

Devops AWS DATA Engineeer|| Technical Analyst || 12Lakhs CTC Vcloud Technologies Investment

7.0 - 9.0 years

11 - 12 Lacs

Hyderabad

Work from Office

We are seeking a highly skilled Devops Engineer to join our dynamic development team. In this role, you will be responsible for designing, developing, and maintaining both frontend and backend components of our applications using Devops and associated technologies. You will collaborate with cross-functional teams to deliver robust, scalable, and high-performing software solutions that meet our business needs. The ideal candidate will have a strong background in devops, experience with modern frontend frameworks, and a passion for full-stack development. Requirements : Bachelor's degree in Computer Science Engineering, or a related field. 7 to 9+ years of experience in full-stack development, with a strong focus on DevOps. DevOps with AWS Data Engineer - Roles & Responsibilities: Use AWS services like EC2, VPC, S3, IAM, RDS, and Route 53. Automate infrastructure using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation . Build and maintain CI/CD pipelines using tools AWS CodePipeline, Jenkins,GitLab CI/CD. Cross-Functional Collaboration Automate build, test, and deployment processes for Java applications. Use Ansible , Chef , or AWS Systems Manager for managing configurations across environments. Containerize Java apps using Docker . Deploy and manage containers using Amazon ECS , EKS (Kubernetes) , or Fargate . Monitoring & Logging using Amazon CloudWatch,Prometheus + Grafana,E Stack (Elasticsearch, Logstash, Kibana),AWS X-Ray for distributed tracing manage access with IAM roles/policies . Use AWS Secrets Manager / Parameter Store for managing credentials. Enforce security best practices , encryption, and audits. Automate backups for databases and services using AWS Backup , RDS Snapshots , and S3 lifecycle rules . Implement Disaster Recovery (DR) strategies. Work closely with development teams to integrate DevOps practices. Document pipelines, architecture, and troubleshooting runbooks. Monitor and optimize AWS resource usage. Use AWS Cost Explorer , Budgets , and Savings Plans . Must-Have Skills: Experience working on Linux-based infrastructure. Excellent understanding of Ruby, Python, Perl, and Java . Configuration and managing databases such as MySQL, Mongo. Excellent troubleshooting. Selecting and deploying appropriate CI/CD tools Working knowledge of various tools, open-source technologies, and cloud services. Awareness of critical concepts in DevOps and Agile principles. Managing stakeholders and external interfaces. Setting up tools and required infrastructure. Defining and setting development, testing, release, update, and support processes for DevOps operation. Have the technical skills to review, verify, and validate the software code developed in the project. Interview Mode : F2F for who are residing in Hyderabad / Zoom for other states Location : 43/A, MLA Colony,Road no 12, Banjara Hills, 500034 Time : 2 - 4pm

Posted 1 month ago

Apply

Full Stack Software Engineer Amgen Inc

1.0 - 3.0 years

3 - 5 Lacs

Hyderabad

Work from Office

What you will do In this vital role you will be responsible for developing, and maintaining software applications, components, and solutions that meet business needs and ensuring the availability and performance of critical systems and applications. This role requires a experience in and a deep understanding of both front and back-end development. The Full Stack Software Engineer will work closely with product managers, designers, and other engineers to create high-quality, scalable software solutions and automating operations, monitoring system health, and responding to incidents to minimize downtime. The Full Stack Software Engineer will also contribute to design discussions and provide guidance on technical feasibility and best standards. Roles & Responsibilities: Develop complex software projects from conception to deployment, including delivery scope, risk, and timeline. Conduct code reviews to ensure code quality and adherence to best practices. Contribute to both front-end and back-end development using cloud technology. Provide ongoing support and maintenance for design system and applications, ensuring reliability, reuse and scalability while meeting accessibility and best standards. Develop innovative solutions using generative AI technologies. Create and maintain documentation on software architecture, design, deployment, disaster recovery, and operations. Identify and resolve technical challenges, software bugs and performance issues effectively. Stay updated with the latest trends and advancements. Analyze and understand the functional and technical requirements of applications, solutions, and systems and translate them into software architecture and design specifications. Develop and execute unit tests, integration tests, and other testing strategies to ensure the quality of the software. Work closely with cross-functional teams, including product management, stakeholders, design, and QA, to deliver high-quality software on time. Maintain detailed documentation of software designs, code, and development processes. Work on integrating with other systems and platforms to ensure seamless data flow and functionality. What we expect of you We are all different, yet we all use our unique contributions to serve patients. Basic Qualifications: Masters degree and 1 to 3 years of experience in Computer Science, IT or related field experience OR Bachelors degree and 3 to 5 years of experience in Computer Science, IT or related field experience OR Diploma and 7 to 9 years of experience in Computer Science, IT or related field experience Must-Have Skills: Hands-on experience with various cloud services, understanding the pros and cons of various cloud services in well-architected cloud design principles. Experience with developing and maintaining design systems across teams. Hands-on experience with Full Stack software development. Proficient in programming languages such as JavaScript, Python, SQL/NoSQL. Familiarity with frameworks such as React JS visualization libraries. Strong problem-solving and analytical skills; ability to learn quickly; excellent communication and interpersonal skills. Experience with API integration, serverless, microservices architecture. Experience in SQL/NoSQL databases, vector databases for large language models. Experience with website development, understanding of website localization processes, which involve adapting content to fit cultural and linguistic contexts. Preferred Qualifications: Good-to-Have Skills: Strong understanding of cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes). Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Splunk). Experience with data processing tools like Hadoop, Spark, or similar. Experience with popular large language models. Experience with Langchain or llamaIndex framework for language models; experience with prompt engineering, model fine-tuning. Professional Certifications: Relevant certifications such as CISSP, AWS Developer certification, CompTIA Network+, or MCSE (preferred). Any SAFe Agile certification (preferred) Soft Skills: Excellent analytical and troubleshooting skills. Strong verbal and written communication skills. Ability to work effectively with global, virtual teams. High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented, with a focus on achieving team goals. Strong presentation and public speaking skills.

Posted 1 month ago

Apply

SRE - Site Reliability Engineer TechBlocks

6.0 - 11.0 years

20 - 25 Lacs

Hyderabad, Ahmedabad

Hybrid

Hi Aspirant, Greetings from TechBlocks - IT Software of Global Digital Product Development - Hyderabad !!! About us : TechBlocks is a global digital product engineering company with 16+ years of experience helping Fortune 500 enterprises and high-growth brands accelerate innovation, modernize technology, and drive digital transformation. From cloud solutions and data engineering to experience design and platform modernization, we help businesses solve complex challenges and unlock new growth opportunities. Job Title: Senior DevOps Site Reliability Engineer (SRE) Location : Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 610 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: Cloud : GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Datadog Containers & Orchestration : Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC : Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have : GCP Monitoring, Skywalking Service Mesh, API Gateway GCP Spanner, Scope: Drive operational excellence and platform resilience Reduce MTTR, increase service availability Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives ( SLOs), and manage error budgets across services. Lead incident management for critical production issues drive Root Cause Analysis (RCA) and postmortems. Create and maintain runbooks and standard operating procedures for high availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana ; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated System Recovery, Alert Suppression, and Escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyze performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews. If you are interested , then please share me your updated resume to kranthikt@tblocks.com Warm Regards, Kranthi Kumar kranthikt@tblocks.com Contact: 8522804902 Senior Talent Acquisition Specialist Toronto | Ahmedabad | Hyderabad | Pune www.tblocks.com

Posted 1 month ago

Apply

Release Manager Acesoft Labs

12.0 - 19.0 years

17 - 30 Lacs

Hyderabad, Ahmedabad

Hybrid

Job Title: Release Manager Tools & Infrastructure Location: Ahmedabad & Hyderabad Experience Level: 12 + years Department: Engineering / Devops Reporting To: Head of Devops / Engineering Director Were looking for a hands-on Release Manager with strong Devops and Infrastructure expertise to lead software release pipelines, tooling, and automation across distributed systems. This role ensures secure, stable, and timely delivery of applications while coordinating across engineering, QA, and SRE teams. Key Responsibilities Release & Environment Management Plan and manage release schedules and cutovers Oversee environment readiness, rollback strategies, and post-deployment validations Ensure version control, CI/CD artifact management, and build integrity Toolchain Ownership Administer tools like Jenkins, GitHub Actions, Bitbucket, SonarQube, Argo CD, JFrog, and Terraform Manage Kubernetes and Helm for container orchestration Maintain secrets via Vault and related tools Infrastructure & Automation Work with Cloud & DevOps teams for secure, automated deployments Use GCP (GKE, VPC, IAM, Load Balancer, GCS) with IaC standards (Terraform, Helm) Monitoring & Stability Implement observability tools: Prometheus, Grafana, ELK, Datadog Monitor release health, manage incident responses, and improve via RCAs Compliance & Coordination Use Jira, Confluence, ServiceNow for planning and documentation Apply OWASP/WAF/GCP Cloud Armor standards Align releases with Dev, QA, CloudOps, and Security teams IF interested share resume to: sowmya.v@acesoftlabs.com

Posted 1 month ago

Apply

Machine Learning, Technical Lead - NLP / LLM Avalara Technologies

6.0 - 10.0 years

8 - 12 Lacs

Pune

Remote

What You'll Do We are looking for experienced Machine Learning Engineers with a background in software development and a deep enthusiasm for solving complex problems. You will lead a dynamic team dedicated to designing and implementing a large language model framework to power diverse applications across Avalara. Your responsibilities will span the entire development lifecycle, including conceptualization, prototyping and delivery of the LLM platform features. You will build core agent infrastructureA2A orchestration and MCP-driven tool discoveryso teams can launch secure, scalable agent workflows. You will be reporting to Senior Manager, Machine Learning What Your Responsibilities Will Be We are looking for engineers who can think quick and have a background in implementation. Your responsibilities will include: Build on top of the foundational framework for supporting Large Language Model Applications at Avalara Experience with LLMs - like GPT, Claude, LLama and other Bedrock models Leverage best practices in software development, including Continuous Integration/Continuous Deployment (CI/CD) along with appropriate functional and unit testing in place. Promote innovation by researching and applying the latest technologies and methodologies in machine learning and software development. Write, review, and maintain high-quality code that meets industry standards, contributing to the project's. Lead code review sessions, ensuring good code quality and documentation. Mentor junior engineers, encouraging a culture of collaboration Proficiency in developing and debugging software with a preference for Python, though familiarity with additional programming languages is valued and encouraged. What You'll Need to be Successful 6+ years of experience building Machine Learning models and deploying them in production environments as part of creating solutions to complex customer problems. Proficiency working in cloud computing environments (AWS, Azure, GCP), Machine Learning frameworks, and software development best practices. Experience working with technological innovations in AI & ML(esp. GenAI) and apply them. Experience with design patterns and data structures. Good analytical, design and debugging skills. Technologies you will work with: Python, LLMs, Agents, A2A, MCP, MLFlow, Docker, Kubernetes, Terraform, AWS, GitLab, Postgres, Prometheus, and Grafana.

Posted 1 month ago

Apply

Senior Machine Learning Engineer - NLP Avalara Technologies

5.0 - 8.0 years

6 - 9 Lacs

Pune

Remote

What You'll Do We are looking for experienced Machine Learning Engineers with a background in software development and a deep enthusiasm for solving complex problems. You will lead a dynamic team dedicated to designing and implementing a large language model framework to power diverse applications across Avalara. Your responsibilities will span the entire development lifecycle, including conceptualization, prototyping and delivery of the LLM platform features. You will have a blend of technical skills in the fields of AI & Machine Learning especially with LLMs and a deep-seated understanding of software development practices where you'll work with a team to ensure our systems are scalable, performant and accurate. You will be reporting to Senior Manager, AI/ML. What Your Responsibilities Will Be We are looking for engineers who can think quick and have a background in implementation. Your responsibilities will include: Build on top of the foundational framework for supporting Large Language Model Applications at Avalara Experience with LLMs - like GPT, Claude, LLama and other Bedrock models Leverage best practices in software development, including Continuous Integration/Continuous Deployment (CI/CD) along with appropriate functional and unit testing in place. Inspire creativity by researching and applying the latest technologies and methodologies in machine learning and software development. Write, review, and maintain high-quality code that meets industry standards. Lead code review sessions, ensuring good code quality and documentation. Mentor junior engineers, encouraging a culture of collaboration. Proficiency in developing and debugging software with a preference for Python, though familiarity with additional programming languages is valued and encouraged. What You'll Need to be Successful Bachelor's/Master's degree in computer science with 5+ years of industry experience in software development, along with experience building Machine Learning models and deploying them in production environments. Proficiency working in cloud computing environments (AWS, Azure, GCP), Machine Learning frameworks, and software development best practices. Work with technological innovations in AI & ML(esp. GenAI) Experience with design patterns and data structures. Good analytical, design and debugging skills. Technologies you will work with: Python, LLMs, MLFlow, Docker, Kubernetes, Terraform, AWS, GitLab, Postgres, Prometheus, Grafana

Posted 1 month ago

Apply

Fullstack Developer - AI/ML Provana

5.0 - 8.0 years

0 Lacs

Noida

Work from Office

Senior Full Stack Engineer We are seeking a Senior Full Stack Engineer to design, build and scale a portfolio of cloud-native products including real-time speech-assessment tools, GenAI content services, and analytics dashboards used by customers worldwide. You will own end-to-end delivery across React/Next.js front-ends, Node/Python micro-services, and a MongoDB-centric data layer, all orchestrated in containers on Kubernetes, while championing multi-tenant SaaS best practices and modern MLOps. Role: Product & Architecture • Design multi-tenant SaaS services with isolated data planes, usage metering, and scalable tenancy patterns. • Lead MERN-driven feature work: SSR/ISR dashboards in Next.js, REST/GraphQL APIs in Node.js or FastAPI, and event-driven pipelines for AI services. • Build and integrate AI/ML & GenAI modules (speech scoring, LLM-based content generation, predictive analytics) into customer-facing workflows. DevOps & Scale • Containerise services with Docker, automate deployment via Helm/Kubernetes, and implement blue-green or canary roll-outs in CI/CD. • Establish observability for latency, throughput, model inference time, and cost-per-tenant across micro-services and ML workloads. Leadership & Collaboration • Conduct architecture reviews, mentor engineers, and promote a culture that pairs AI-generated code with rigorous human code review. • Partner with Product and Data teams to align technical designs with measurable business KPIs for AI-driven products. Required Skills & Experience • Front-End React 18, Next.js 14, TypeScript, modern CSS/Tailwind • Back-End Node 20 (Express/Nest) and Python 3.11 (FastAPI) • Databases MongoDB Atlas, aggregation pipelines, TTL/compound indexes • AI / GenAI Practical ML model integration, REST/streaming inference, prompt engineering, model fine-tuning workflows • Containerisation & Cloud Docker, Kubernetes, Helm, Terraform; production experience on AWS/GCP/Azure • SaaS at Scale Multi-tenant data isolation, per-tenant metering & rate-limits, SLA design • CI/CD & Quality GitHub Actions/GitLab CI, unit + integration testing (Jest, Pytest), E2E testing (Playwright/Cypress) Preferred Candidate Profile • Production experience with speech analytics or audio ML pipelines. • Familiarity with LLMOps (vector DBs, retrieval-augmented generation). • Terraform-driven multi-cloud deployments or FinOps optimization. • OSS contributions in MERN, Kubernetes, or AI libraries. Tech Stack & Tooling - React 18 • Next.js 14 • Node 20 • FastAPI • MongoDB Atlas • Redis • Docker • Kubernetes • Helm • Terraform • GitHub Actions • Prometheus + Grafana • OpenTelemetry • Python/Rust micro-services for ML inference

Posted 1 month ago

Apply

Devops Engineer Dorian Mode Technologies Dorian Mo De Technologies Private Limited

1.0 - 3.0 years

3 - 7 Lacs

Thane

Work from Office

Role & responsibilities : Deploy, configure, and manage infrastructure across cloud platforms like AWS, Azure, and GCP. Automate provisioning and configuration using tools such as Terraform. Design and maintain CI/CD pipelines using Jenkins, GitLab CI, or CircleCI to streamline deployments. Build, manage, and deploy containerized applications using Docker and Kubernetes. Set up and manage monitoring systems like Prometheus and Grafana to ensure performance and reliability. Write scripts in Bash or Python to automate routine tasks and improve system efficiency. Collaborate with development and operations teams to support deployments and troubleshoot issues. Investigate and resolve technical incidents, performing root cause analysis and implementing fixes. Apply security best practices across infrastructure and deployment workflows. Maintain documentation for systems, configurations, and processes to support team collaboration. Continuously explore and adopt new tools and practices to improve DevOps workflows.

Posted 1 month ago

Apply

GCP Dev Ops Engr NTT DATA, Inc.

2.0 - 5.0 years

1 - 6 Lacs

Noida, Hyderabad

Work from Office

We are currently seeking a GCP Dev Ops Engr to join our team in Ban/Hyd/Chn/Gur/Noida, Karntaka (IN-KA), India (IN). Responsibilities Design, implement, and manage GCP infrastructure using Infrastructure as Code (IaC) tools. Develop and maintain CI/CD pipelines to improve development workflows. Monitor system performance and ensure high availability of cloud resources. Collaborate with development teams to streamline application deployments. Maintain security best practices and compliance across the cloud environment. Automate repetitive tasks to enhance operational efficiency. Troubleshoot and resolve infrastructure-related issues in a timely manner. Document procedures, policies, and configurations for the infrastructure. Skills Google Cloud Platform (GCP) Terraform Ansible CI/CD Kubernetes Docker Python Bash/Shell Scripting Monitoring tools (e.g., Prometheus, Grafana) Cloud Security Jenkins Git

Posted 1 month ago

Apply

Golang developer R Systems International

4.0 - 7.0 years

5 - 9 Lacs

Noida

Work from Office

Proficiency in Go programming language (Golang). Solid understanding of RESTful API design and microservices architecture. Experience with SQL and NoSQL databases (e.g., PostgreSQL, MongoDB, Redis). Familiarity with container technologies (Docker, Kubernetes). Understanding of distributed systems and event-driven architecture. Version control with Git. Familiarity with CI/CD pipelines and cloud platforms (AWS, GCP, Azure). Experience with message brokers (Kafka, RabbitMQ). Knowledge of GraphQL. Exposure to performance tuning and profiling. Contributions to open-source projects or personal GitHub portfolio. Familiarity with monitoring tools (Prometheus, Grafana, ELK). Roles and Responsibilities Design, develop, and maintain backend services and APIs using Go (Golang). Write efficient, scalable, and reusable code. Collaborate with front-end developers, DevOps engineers, and product teams to deliver high-quality features. Optimize applications for performance and scalability. Develop unit and integration tests to ensure software quality. Implement security and data protection best practices. Troubleshoot and debug production issues. Participate in code reviews, architecture discussions, and continuous improvement processes.

Posted 1 month ago

Apply

Site Reliability Engineer - CloudOps The Hiring Xpert

1.0 - 3.0 years

10 - 15 Lacs

Pune, Bengaluru

Work from Office

Must have a minimum 1 yr exp in SRE (CloudOps), Google Cloud platforms (GCP), monitoring, APM, and alerting tools like Prometheus, Grafana, ELK, Newrelic, Pingdom, or Pagerduty, Hands-on experience with Kubernetes for orchestration and container mgt Required Candidate profile Mandatory expreience working in B2C Product Companies. Must have Experience with CI/CD tools e.g. (Jenkins, GitLab CI/CD, CircleCI TravisCI..)

Posted 1 month ago

Apply

Lead SRE Ltimindtree

8.0 - 12.0 years

30 - 35 Lacs

Pune, Chennai

Work from Office

Mandatory Skills SRE, DevOps, Scripting (Python/Bash/Perl), Automation Tools (Ansible/Terraform/Puppet), AWS Cloud, Docker, Kubernetes, Observability Tools (Prometheus/Grafana/ELK Stack/Splunk), CICD pipelines using GitLab Jenkins or similar tools Please share your resume to thulasidharan.b@ltimindtree.com Note: Only 0-30 days notice

Posted 1 month ago

Apply

Cloud Application Architect Multi Recruit

12.0 - 15.0 years

30 - 35 Lacs

Bengaluru

Work from Office

We are seeking a highly experienced and technically profound Cloud Application Architect to drive our cloud-first digital transformation initiatives. This pivotal role involves leading the design, development, and modernization of our enterprise application portfolio to deliver modern, scalable, secure, and business-aligned cloud-native solutions. The ideal candidate will possess a deep, hands-on technical background in application architecture, with a focus on transforming legacy systems into agile, customer-centric, and cloud-optimized experiences within either the Microsoft or Java enterprise stack. This role is critical for shaping our application landscape, ensuring robust end-to-end design, and guiding development teams through complex architectural challenges in a dynamic, cloud-first environment. Key Responsibilities As a Senior Cloud Application Architect, you will: Define Cloud-Native Application Architectures: Lead the definition, design, and implementation of comprehensive cloud-native application architectures and strategic modernization roadmaps for critical enterprise systems, primarily leveraging AWS EKS, Azure AKS, and serverless functions (e.g., AWS Lambda, Azure Functions). Own End-to-End Application Design: Hold ultimate accountability for the end-to-end application design, ensuring solutions meet stringent requirements for scalability (handling high transaction volumes), performance (low latency), robust security (integrating DevSecOps principles like SAST/DAST, Zero Trust), and high reliability (achieving stringent uptime targets). Guide Microservices/API Architecture & Containerization: Provide senior technical guidance and mentorship to multiple distributed project teams on advanced microservices and API-first design patterns, including choreography vs. orchestration, eventual consistency, and idempotent API design. Lead the adoption and implementation of Docker containerization and Kubernetes orchestration (AKS/EKS) for efficient application deployment and management. Develop Deployment & Operational Strategy: Define and enforce declarative deployment strategies (e.g., GitOps with ArgoCD/FluxCD). Design application-level disaster recovery and business continuity plans, including multi-region deployments with active-active/active-passive patterns and automated failover mechanisms. Collaborate Cross-Functionally: Collaborate extensively as a strategic partner with cross-functional teams including software developers (Java/.NET), product owners, business analysts, DevOps engineers, security specialists, and infrastructure teams. Translate complex business requirements into clear, actionable technical specifications. Lead Technical Design Sessions & Governance: Lead high-stakes technical design sessions, facilitate architecture review boards (ARB), and prepare comprehensive architectural documentation (e.g., Architecture Decision Records (ADRs), sequence diagrams, data flow diagrams) to ensure alignment, maintain architectural integrity, and govern new feature implementations. Support Build vs. Buy & Tool Selection: Actively support critical build vs. buy analyses for new functionalities. Evaluate, select, and champion various cloud services (PaaS, SaaS) and third-party tools (e.g., API Management gateways, caching solutions, message brokers) based on technical fit, business needs, and cost efficiency. Conduct and present Proof-of-Concepts (PoCs) for emerging technologies and strategic platform integrations. Drive DevSecOps & Observability Integration: Champion the integration of advanced DevSecOps practices, from "shift-left" security to automated CI/CD pipelines. Implement comprehensive application observability solutions (e.g., Prometheus, Grafana, Application Insights) to monitor SLOs/SLIs, diagnose performance issues, and proactively ensure system health. Optimize Application-Level Costs: Design and optimize application architectures to maximize cloud cost efficiency, leveraging serverless computing, right-sizing container workloads, and implementing intelligent autoscaling policies. Mentor & Foster Innovation: Mentor junior and mid-level developers and architects on cloud-native development best practices, application refactoring techniques, and effective utilization of cloud services. Explore and prototype the integration of emerging technologies (e.g., AI/ML, Generative AI) for intelligent features and digital workflow automation. Qualifications: Education: Bachelors or Masters degree in Computer Science, Engineering, Information Technology, or a related field. Experience: 1 12+ years of progressive experience in application architecture, with a significant and demonstrable focus on cloud-native application design, digital-first transformations, and modernizing enterprise software. Application Development Background: Strong application background with hands-on experience in either the Microsoft (.NET Core, ASP.NET) or Java (Spring Boot, J2EE) enterprise/product software architecture. Cloud Platform Expertise: Proven experience delivering cloud-first solutions using public cloud platforms (AWS, Azure are preferred; GCP experience is a plus), with a deep understanding of their PaaS and IaaS offerings relevant to application development. Modern Application Design Principles: Deep knowledge and hands-on experience with microservices, API-driven development, event-driven architecture, serverless computing, and domain-driven design. Containerization & Orchestration: Expertise in Docker and Kubernetes (EKS, AKS), including deployment strategies and operational best practices for containerized applications. Agile, DevOps, & CI/CD: Strong understanding and practical experience with agile delivery models, comprehensive DevOps practices, and continuous integration/deployment (CI/CD) pipelines. Communication & Stakeholder Management: Excellent communication, presentation, and stakeholder management skills, with a proven ability to bridge technical and business perspectives, and advise senior leadership. Leadership & Governance: Extensive experience in leading cross-functional development and architecture teams, managing architectural governance, and mentoring engineers in large-scale programs. Preferred Skills: Cloud Certifications: Relevant cloud certifications (e.g., AWS Certified Solutions Architect – Professional, Azure Solutions Architect Expert, Certified Kubernetes Application Developer - CKAD). Enterprise Architecture Frameworks: Knowledge of enterprise architecture frameworks (e.g., TOGAF) in the context of digital transformation. Observability Tools: Experience with comprehensive observability solutions for applications (e.g., Prometheus, Grafana, Datadog, Application Insights, distributed tracing tools like Jaeger). Security by Design: Direct experience implementing security best practices at the application architecture level (e.g., OWASP, threat modeling, secure coding standards). AI/ML Integration: Experience with integrating analytics, personalization, and AI/ML capabilities into application architectures. Low-Code/No-Code Platforms: Exposure to low-code/no-code development tools and digital workflow automation platforms.

Posted 1 month ago

Apply

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

900 Prometheus Jobs - Page 25

Job Alert

Start Your Job Search Today

Please Verify Your Phone or Email

Job Application AI Bot

Download the Mobile App

Setup Job Alerts

Featured Companies

Before You Leave... Find Your Perfect Job!

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Search

Profile

Upskill and Grow with AI

Personal Settings

900 Prometheus Jobs - Page 25

Job Alert

Upload Resume

AI Job Matching Summary

Pros

Cons

Summary

Start Your Job Search Today

Please Verify Your Phone or Email

Job Application AI Bot

Download the Mobile App

Setup Job Alerts

Featured Companies