Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
2.0 - 5.0 years
4 - 9 Lacs
Chennai
Work from Office
Job Title: Dashboard Developer Location: Chennai/INDIA Requirements: Ability to work between 2pm to 11pm IST supporting client base in the U.S.A Looking for immediate joiners About the Role: This role is to supplement the IT organization in developing dashboards from various sources that generates system related data, precisely from monitoring systems with the expertise into basic operations of monitoring. Additionally able to script for minimal automations that are required in this domain. Responsibilities: Must be able to develop own dashboards using tools like Power BI, Graphane and Perses etc. Should understand monitoring systems like LogicMonitor, Prometheus or any other open-source monitoring system Able to correlate the data streams and make dashboards to drive business efficiency Candidate Requirements: Must have a good working knowledge in Power BI and Graphane (Other open-source dashboard solutions are added advantage) Need to have a good knowledge about Prometheus, an open source monitoring tool Good working knowledge in managing dockers and Linux DevOps is added advantage however not mandatory Strong understanding of infrastructure is required Good communication skills is required Ability and experience in working with stakeholders of different geographies Willing to learn and scale up as needed
Posted 4 days ago
4.0 - 9.0 years
6 - 11 Lacs
Bengaluru
Work from Office
About us: As a Fortune 50 company with more than 400,000 team members worldwide, Target is an iconic brand and one of America's leading retailers. Joining Target means promoting a culture of mutual care and respect and striving to make the most meaningful and positive impact. Becoming a Target team member means joining a community that values different voices and lifts each other up. Here, we believe your unique perspective is important, and you'll build relationships by being authentic and respectful. Overview about TII At Target, we have a timeless purpose and a proven strategy. And that hasnt happened by accident. Some of the best minds from different backgrounds come together at Target to redefine retail in an inclusive learning environment that values people and delivers world-class outcomes. That winning formula is especially apparent in Bengaluru, where Target in India operates as a fully integrated part of Targets global team and has more than 4,000 team members supporting the companys global strategy and operations. (Pyramid overview) Network Security Monitoring (NSM) Position About Network Security Monitoring: Target's Network Security Monitoring (NSM) team builds and maintains a fleet of over 2000 network sensors across the globe, providing network visibility and advanced monitoring capabilities to our Cyber Defense organization. We build scalable and maintainable infrastructure with full end-to-end ownership of both the hardware and software lifecycle. Our work enables timely detection and response of adversaries by delivering reliable network visibility through a resilient sensor grid and advanced monitoring capability. Team Overview NSM team members regularly: - Collaborate with Networking partners on network design and network sensor placement - Build, deploy, and upgrade network sensors (servers) globally - Design and implement network traffic analysis solutions using engines like Zeek and Suricata - Leverage Salt for configuration management, deployment automation, and infrastructure-as-code implementation - Partner with Cyber Defense to build network-based detections and consult in response scenarios - Develop performance monitoring solutions to track data quality and sensor health to ensure grid health and data fidelity Position Overview Expect to: - Configure, troubleshoot, and optimize network sensors across diverse environments - Debug complex networking issues and perform packet-level analysis to ensure proper traffic visibility. - Build and maintain Salt-based automation for configuration management and deployment. - Analyze monitoring data to identify system improvements and validate detection coverage. - Develop and automate testing to ensure results and outcomes are as expected. - Participate in on-call rotations to support the global sensor grid and respond to critical issues. - Collaborate cross-functionally with teams throughout Cyber Defense and IT - Document operational procedures for sensor management best practices - Research new network security monitoring technologies and evaluate their potential implementation. - Contribute to capacity planning and architectural design of monitoring infrastructure. - Manage and maintain Linux/Unix-based systems that host Zeek sensors, ensuring high availability, performance, and security. - Perform OS-level troubleshooting, patching, and hardening of sensor infrastructure. - Automate server provisioning and configuration using tools like Salt, shell scripting, and Python. - Monitor system logs and metrics to proactively identify and resolve issues affecting sensor performance. About you: - Bachelor's degree in Networking, Computer Science, or related field (or equivalent experience). - 4+ years of experience in network administration, network security, or related roles, with a deep knowledge of network protocols and packet analysis. - Experience with network security monitoring tools, including Zeek and Suricata. - Strong foundation in automation and infrastructure as code, Salt experience preferred. - You understand CI/CD principles and can implement pipelines for testing and deploying code and configuration changes. - Proficient in Linux/Unix systems administration, including shell scripting, system tuning, and troubleshooting. - Hands-on experience managing server infrastructure in production environments, including patching, upgrades, and performance tuning. - Practical experience with packet capture technologies and traffic analysis tools. - Proven ability to troubleshoot complex distributed systems and methodically diagnose network issues. - You appreciate the importance of dev/prod parity and can design for consistent environments across dev and prod. - Experience writing custom detection rules and understanding their performance implications. - Familiarity with technologies such as Zabbix, Prometheus, Nagios, Grafana, Elastic, Kibana
Posted 4 days ago
4.0 - 8.0 years
8 - 18 Lacs
Bengaluru
Hybrid
Job Title: DevOps L2 Support Engineer Experience: 46 Years Work Type: Full-time | Rotational Shift (24x7) Shift Window: Between 09:30 AM 10:30 PM Job Description: We are looking for a DevOps L2 Support Engineer with strong debugging capabilities and a solid foundation in Linux and scripting. The ideal candidate should be able to work in a fast-paced production support environment and proactively resolve issues related to infrastructure, deployment, and monitoring systems. Key Responsibilities: Perform root cause analysis and debug Java-based applications in a production environment Monitor and support systems hosted in Linux environments Write and manage shell scripts for automation and operational tasks Manage and troubleshoot Docker containers and commands Analyze logs and build insightful dashboards using Splunk Write effective Splunk queries to identify and resolve issues based on error signatures Collaborate with development and infrastructure teams to ensure seamless deployments and operations Participate in a 24x7 rotational shift , including two daily shifts between 9:30 AM to 10:30 PM Required Skills: Strong debugging skills in Java-based environments Hands-on experience with Linux OS and command-line tools Experience with Shell Scripting Working knowledge of Docker and related commands Proficiency in Splunk query writing and dashboard creation Excellent communication skills (both written and verbal) Good to Have: Exposure to Prometheus for system monitoring Understanding of CI/CD pipelines and basic DevOps concepts
Posted 4 days ago
1.0 - 4.0 years
1 - 5 Lacs
Mumbai
Work from Office
DevOps Engineer (2–8 yrs) – Mumbai. Experience in CI/CD (Jenkins/GitLab), Docker, Kubernetes, Terraform/Ansible, AWS/Azure, Git, and monitoring tools (Prometheus/Grafana). Strong scripting & DevOps practices.
Posted 4 days ago
7.0 - 9.0 years
19 - 25 Lacs
Pune
Work from Office
Location: Pune Experience: 7 - 9 Years Notice Period: Immediate to 15 Days Overview We are looking for an experienced IT Operations (Monitoring & Observability) Consultant to design, implement, and optimize end-to-end observability solutions. The ideal candidate will have a strong background in monitoring frameworks, ITSM integrations, and AIOps tools to drive system reliability, performance, and proactive incident management. Key Responsibilities Design and deploy comprehensive monitoring and observability architectures for infrastructure, applications, and networks. Implement tools like Prometheus, Grafana, OpsRamp, Dynatrace, New Relic for system performance monitoring. Integrate monitoring systems with ITSM platforms (e.g., ServiceNow, BMC Remedy). Develop dashboards, alerts, and reports to enable real-time performance insights. Architect solutions for hybrid and multi-cloud environments. Automate alerting, remediation, and reporting to streamline operations. Apply AIOps and ML for anomaly detection and predictive insights. Collaborate with DevOps, infra, and app teams to embed monitoring into CI/CD. Document architectures, procedures, and operational playbooks. Required Skills Hands-on experience with observability tools: Prometheus, Grafana, ELK Stack, Fluentd, Dynatrace, New Relic, OpsRamp . Strong scripting knowledge in Python, Ansible . Familiar with tracing tools (e.g., Jaeger, Zipkin ) and REST API integrations . Working knowledge of AIOps concepts and predictive monitoring. Solid understanding of ITIL processes and service management frameworks . Familiarity with security monitoring and compliance considerations. Excellent analytical, troubleshooting, and documentation skills.
Posted 4 days ago
6.0 - 10.0 years
10 - 20 Lacs
Bengaluru
Work from Office
Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. The Role Are you ready to join the team of software engineering experts at Kyndryl? We are seeking a talented Software Engineering Technical Specialist to contribute to our software engineering space and provide critical skills required for the development of cutting-edge products. As a Software Engineering Technical Specialist, you will develop solutions in specific domains such as Security, Systems, Databases, Networking Solutions, and more. You will be a leader – contributing knowledge, guidance, technical expertise, and team leadership skills. Your leadership will be demonstrated in your work, to your customers, and within your teams. At Kyndryl, we value effective communication and collaboration skills. When you recognise opportunities for business change, you will have the ability to clearly and persuasively communicate complex technical and business concepts to both customers and team members. You’ll be the go-to person for problem-solving of customers’ business and technical issues. You have a knack for effectively identifying and framing problems, leading the collection of elements of information, and integrating this information to produce timely and thoughtful decisions. Your aim throughout, is to improve the effectiveness, efficiency and delivery of services through the use of technology and technical methods and methodologies. Driving the design, development, integration, delivery, and evolution of highly scalable distributed software you will integrate with other layers and offerings. You will provide deeper functionality and solutions to address customer needs. You will work closely with software engineers, architects, product managers, and partner teams to get high-quality products and features through the agile software development lifecycle. Your continuous grooming of features/user stories to estimate, identify technical risks/dependencies and clearly communicate them to project stakeholders will ensure the features are delivered with the right quality and within timeline. You will maintain and drive the clearing of technical debt, vulnerabilities, and currency of the 3rd party components within the product. As a Software Engineering Technical Specialist, you will also coach and mentor engineers to design and implement highly available, secure, distributed software in a scalable architecture. This is an opportunity to make a real impact and contribute to the success of Kyndryl's innovative software products. Join us and become a key player in our team of software engineering experts! Your Future at Kyndryl Every position at Kyndryl offers a way forward to grow your career. We have opportunities that you won’t find anywhere else, including hands-on experience, learning opportunities, and the chance to certify in all four major platforms. Whether you want to broaden your knowledge base or narrow your scope and specialize in a specific sector, you can find your opportunity here. Who You Are You’re good at what you do and possess the required experience to prove it. However, equally as important – you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused – someone who prioritizes customer success in their work. And finally, you’re open and borderless – naturally inclusive in how you work with others. Required Technical and Professional Expertise 5–8 years of experience in infrastructure monitoring, logging, or performance engineering. Strong experience designing and implementing observability solutions using: Azure Monitor / Application Insights / Log Analytics Prometheus / Grafana / Loki / ELK / Splunk OpenTelemetry, Fluentd/Fluent Bit, Jaeger Familiarity with microservices, Kubernetes (AKS), and cloud-native patterns Experience working with CI/CD tools (GitHub Actions, Azure DevOps) and automation (Terraform, ARM, Bicep). Knowledge of ITSM and incident management workflows Preferred Technical and Professional Experience Strong communication and documentation skills. Ability to balance multiple priorities across delivery teams. Collaborative mindset with a focus on solution quality and operational reliability Being You Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we’re not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you – and everyone next to you – the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That’s the Kyndryl Way. What You Can Expect With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter – wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed. Get Referred! If you know someone that works at Kyndryl, when asked ‘How Did You Hear About Us’ during the application process, select ‘Employee Referral’ and enter your contact's Kyndryl email address.
Posted 4 days ago
10.0 - 12.0 years
16 - 20 Lacs
Bengaluru
Work from Office
About the Job The Data & AI team is a highly focused effort to lead digital-first execution and transformation at Red Hat leveraging data strategically for our customers, partners, and associates. Radical CollaborationThere is no work done in isolation, as such, each team has to strive to collaborate with teams within the group, cross-group, and the communities. You will strive to make these collaborations as seamless as possible using tools, processes, best practices, and your own brand of creative problem-solving. Continuous LearningThis is a fast paced team and you are expected to be continuously curious, have a can do attitude and be proficient in understanding multiple aspects of the business, continuous improving your skill sets (technical and business) as the industry progresses Data and AI team is looking for a Engineering Manager to lead the Platform practice for the next generation SaaS based data and AI products. You will interact with product managers, Red Hat Sales, Marketing, Finance teams and data platform and product engineers to deliver a sophisticated data as-a-service platform. You'll coach and develop software engineers as they build the Platform, Infrastructure-as-code components, platform observability, agentic AI capabilities and other software to autonomously manage the environment, and guide problem management resolution (PMR) analysis when things go wrong. Youll work in a fast-paced globally distributed team while quickly learning new skills and creating ways to consistently meet service-level agreements (SLAs) for our data products.This role requires a leader with a proven record of navigating the complexities of working across multiple organizations, helping define and gain consensus on strategy and direction, and aligning the team(s) toward those end goals. What you will do Support engineering team to foster and deliver in an inner-source manner Develop, and retain a team of engineers developing and operating Red Hats data-as-service platform Coach engineers on good engineering principleswriting good code, automation, observability, toil reduction, and root cause analysis Manage high-visibility project delivery, including estimation, schedule, risks, and dependencies Design processes and communication norms that facilitate coordination across a fast-moving, fast-growing, diverse global team Lead your team through frequent changes in organization, process, and technology commensurate with a high growth cloud service in a competitive market Participate in a periodic 24x7 management escalation on-call rotation What you will bring 10-12 years of hands on developing and maintaining software. 5+ years experience managing high performing engineering teams Previous software engineering experience delivering data products, applications, or services on cloud native or hybrid platforms Experience with Agile methodologies and working in a DevOps culture with continuous integration / continuous deliveries Ability to lead distributed, remote teams working across multiple time zones Ability to discuss complex technical issues with engineers, product managers, and less-technical stakeholders including customers and senior leaders Understand and collaborate with compliance teams to make sure Platform and Products are compliant as per regulation. Experience hiring and developing engineers Experience in communication with stakeholder and leadership The following are considered a plus: Experience with platforms like Kuberentes/OpenShift and Kubernetes Operators, Prometheus, Graphana etc Experience with Go and Python for developing scaling backend software. Experience with building full stack applications Knowledge of SaaS technologies like Snowflake, Fivetran, Astro etc. About Red Hat Red Hat is the worlds leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact. Inclusion at Red Hat Red Hats culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from different backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions that compose our global village. Equal Opportunity Policy (EEO) Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law. Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email application-assistance@redhat.com . General inquiries, such as those regarding the status of a job application, will not receive a reply.
Posted 4 days ago
3.0 - 5.0 years
10 - 12 Lacs
Chennai, Delhi / NCR, Bengaluru
Work from Office
Responsibilities Work with development teams to ideate software solutions Building and setting up new development tools and infrastructure Working on ways to automate and improve development and release processes Ensuring that systems are safe and secure against cybersecurity threats Deploy updates and fixes Perform root cause analysis for production errors Develop scripts to automate infrastructure provision Working with software developers and software engineers to ensure that development follows established processes and works as intended Requirements At least 2+ years of professional experience as a software developer / DevOps engineer or equivalent Professional experience with Golang Experience with test-driven development and the use of testing frameworks Strong communication skills Technologies we use GitOps GitHub, GitLab, BitBucket Language Golang CI/CD Jenkins, Circle CI, Travis CI, TeamCity, Azure DevOps Containerization Docker, Swarm, Kubernetes Provisioning Terraform CloudOps Azure, AWS, GCP Observability Prometheus, Grafana, GrayLog, ELK Location: Delhi NCR,Bangalore,Chennai,Pune,Kolkata,Ahmedabad,Mumbai,Hyderabad
Posted 4 days ago
3.0 - 8.0 years
30 - 35 Lacs
Mumbai, Delhi / NCR, Bengaluru
Work from Office
Skill required- Azure Data Factory, Kubernetes, Azure DevOps Must-Have:- Working experience on Azure DevOps (4+ years) Working experience on Kubernetes - scripting, deployment Data Factory Terraform scripting Ansible Powershell Python, Cloud Formation, Good knowledge of ITIL process (good to have) Must have: Strong knowledge of Kubernetes, Istio Service mesh Linux - CLI and Basic knowledge of the OS Scripting (Bash and YAML) Containerization and Docker essentials Jenkins Pipeline creation and execution SCM Management such as GitHub and SVN Cloud Platform Knowledge Azure Monitoring tools like Grafana, Prometheus, ELK stack Certifications (Good to have): 1. Solutions architect associate 2. Certified Kubernetes Administrator (CKA) Location: Remote, Anywhere in- Delhi / NCR,Bangalore/Bengaluru ,Hyderabad/Secunderabad,Chennai,Pune,Kolkata,Ahmedabad,Mumbai
Posted 4 days ago
1.0 - 3.0 years
8 - 13 Lacs
Pune
Work from Office
Overview We are seeking a DevOps Engineer to join the Critical Start Technologies Private Ltd. team, operating under the Critical Start umbrella, for our India operations. The ideal candidate brings 1–3 years of experience, a strong background in AWS and Terraform, and a passion for infrastructure as code. Candidates should be skilled at writing well-structured Terraform modules, proficient in AWS service provisioning, and familiar with best practices for managing IaaS and PaaS environments. Additional experience with Linux administration, GitHub Actions, container orchestration, and monitoring solutions such as CloudWatch or Prometheus is a plus. Your experience includes writing production code and proficiency in understanding and structuring large projects using Terraform modules. You possess a deep understanding of provisioners and are well-versed in remote state management. We value individuals who are proactive, detail-oriented, and passionate about infrastructure as code. Critical Start is committed to building an inclusive, equitable, and respectful workplace, and we welcome candidates from all backgrounds to apply. Responsibilities As a DevOps Engineer, you will play a key role in maintaining, evolving, and enhancing our existing Terraform-based infrastructure. You'll work across a diverse infrastructure stack to support the delivery of new projects and services to our customers. A core part of your responsibilities will be using Terraform to build modular, maintainable, and scalable infrastructure solutions. You will also take initiative in identifying opportunities to improve performance—focusing on responsiveness, availability, and scalability. Establishing effective monitoring and alerting systems will be essential, as will troubleshooting issues within distributed systems, including throughput, resource utilization, and configuration. Our infrastructure stack includes the following components: Terraform: Used for comprehensive infrastructure management. AWS Fargate: Primary platform for hosting most of our applications and services, along with select EC2 instances for specific use cases. Monitoring and alerts: AWS CloudWatch, SNS, New Relic, and Sentry.io support effective monitoring and timely alerting. Storage and databases: S3, Postgres (RDS), Memcached, RabbitMQ, and AWS Elasticsearch Service handle our storage and data processing needs. Networking and security: VPC, Route 53, IAM, ALB/NLB, Security Groups, and Secrets Manager support a secure and resilient networking environment. CI/CD pipeline: Built using EC2 Image Builder, CodeBuild, and GitHub to streamline software delivery and deployment. Qualifications Required Qualifications: 1-3 years of professional experience in a DevOps, Site Reliability Engineer, or Systems Engineering role. Ability to work through ambiguity and uncertainty. You have a solid understanding of CI/CD pipelines, including their purpose and implementation, and hands-on experience setting them up in real-world environments. You bring experience working with Terraform for provisioning using modular approaches. Strong troubleshooting, problem-solving, and collaborative mindset . You hold a Bachelor's degree from a recognized institution or possess equivalent practical experience that demonstrates your technical capabilities. Preferred Qualifications: Shell scripting experience is a strong plus. Strong knowledge of Linux/Unix systems. Familiarity with source control tools, such as Git. Experience with observability tools such as CloudWatch, New Relic, or Sentry.io Proficiency with Docker and practical experience running containers in AWS environments such as EC2 and Fargate.
Posted 5 days ago
5.0 - 10.0 years
13 - 15 Lacs
Pune, Chennai, Bengaluru
Work from Office
Grafana specialist to lead the creation of robust dashboards for comprehensive end-to-end monitoring. Strong background in production support monitoring, with a keen understanding of the metrics that matter to both technology teams and management. Required Candidate profile 5y-Build Grafana dashboards for monitoring Use Prometheus&exporters for real-time data Integrate multi-source data&alerts Create Unix/Python scripts for log automation Manage Jira/ServiceNow dashboard
Posted 5 days ago
2.0 - 7.0 years
13 - 17 Lacs
Chennai
Work from Office
Job Area: Engineering Group, Engineering Group > Software Engineering General Summary: As a leading technology innovator, Qualcomm pushes the boundaries of what's possible to enable next-generation experiences and drives digital transformation to help create a smarter, connected future for all. As a Qualcomm Software Engineer, you will design, develop, create, modify, and validate embedded and cloud edge software, applications, and/or specialized utility programs that launch cutting-edge, world class products that meet and exceed customer needs. Qualcomm Software Engineers collaborate with systems, hardware, architecture, test engineers, and other teams to design system-level software solutions and obtain information on performance requirements and interfaces. Minimum Qualifications: Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience. OR Master's degree in Engineering, Information Systems, Computer Science, or related field and 1+ year of Software Engineering or related work experience. OR PhD in Engineering, Information Systems, Computer Science, or related field. 2+ years of academic or work experience with Programming Language such as C, C++, Java, Python, etc. Job Title: MLOps Engineer - ML Platform Hiring Title: Flexible based on candidate experience – about Staff Engineer preferred : We are seeking a highly skilled and experienced MLOps Engineer to join our team and contribute to the development and maintenance of our ML platform both on premises and AWS Cloud. As a MLOps Engineer, you will be responsible for architecting, deploying, and optimizing the ML & Data platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana. Your expertise in AWS services such as EKS, EC2, VPC, IAM, S3, and EFS will be crucial in ensuring the smooth operation and scalability of our ML infrastructure. You will work closely with cross-functional teams, including data scientists, software engineers, and infrastructure specialists, to ensure the smooth operation and scalability of our ML infrastructure. Your expertise in MLOps, DevOps, and knowledge of GPU clusters will be vital in enabling efficient training and deployment of ML models. Responsibilities will include: Architect, develop, and maintain the ML platform to support training and inference of ML models. Design and implement scalable and reliable infrastructure solutions for NVIDIA clusters both on premises and AWS Cloud. Collaborate with data scientists and software engineers to define requirements and ensure seamless integration of ML and Data workflows into the platform. Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment. Monitor and troubleshoot system performance, identifying and resolving issues to ensure the availability and reliability of the ML platform. Implement and maintain CI/CD pipelines for automated model training, evaluation, and deployment using technologies like ArgoCD and Argo Workflow. Implement and maintain monitoring stack using Prometheus and Grafana to ensure the health and performance of the platform. Manage AWS services including EKS, EC2, VPC, IAM, S3, and EFS to support the platform. Implement logging and monitoring solutions using AWS CloudWatch and other relevant tools. Stay updated with the latest advancements in MLOps, distributed computing, and GPU acceleration technologies, and proactively propose improvements to enhance the ML platform. What are we looking for: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. Proven experience as an MLOps Engineer or similar role, with a focus on large-scale ML and/or Data infrastructure and GPU clusters. Strong expertise in configuring and optimizing NVIDIA DGX clusters for deep learning workloads. Proficient in using the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus , and Grafana . Solid programming skills in languages like Python, Go and experience with relevant ML frameworks (e.g., TensorFlow, PyTorch ). In-depth understanding of distributed computing, parallel computing, and GPU acceleration techniques. Familiarity with containerization technologies such as Docker and orchestration tools. Experience with CI/CD pipelines and automation tools for ML workflows (e.g., Jenkins, GitHub, ArgoCD). Experience with AWS services such as EKS , EC2, VPC, IAM, S3, and EFS. Experience with AWS logging and monitoring tools. Strong problem-solving skills and the ability to troubleshoot complex technical issues. Excellent communication and collaboration skills to work effectively within a cross-functional team. We would love to see: Experience with training and deploying models. Knowledge of ML model optimization techniques and memory management on GPUs. Familiarity with ML-specific data storage and retrieval systems. Understanding of security and compliance requirements in ML infrastructure.
Posted 5 days ago
10.0 - 15.0 years
14 - 19 Lacs
Hyderabad
Work from Office
Job Area: Engineering Group, Engineering Group > Software Engineering General Summary: Job Summary: Qualcomm is seeking a seasoned Staff Engineer, DevOps to join our central software engineering team. In this role, you will lead the design, development, and deployment of scalable cloud-native and hybrid infrastructure solutions, modernize legacy systems, and drive DevOps best practices across products. This is a hands-on architectural role ideal for someone who thrives in a fast-paced, innovation-driven environment and is passionate about building resilient, secure, and efficient platforms. Key Responsibilities: Architect and implement enterprise-grade AWS cloud solutions for Qualcomm’s software platforms. Design and implement CI/CD pipelines using Jenkins, GitHub Actions, and Terraform to enable rapid and reliable software delivery. Develop reusable Terraform modules and automation scripts to support scalable infrastructure provisioning. Drive observability initiatives using Prometheus, Grafana, Fluentd, OpenTelemetry, and Splunk to ensure system reliability and performance. Collaborate with software development teams to embed DevOps practices into the SDLC and ensure seamless deployment and operations. Provide mentorship and technical leadership to junior engineers and cross-functional teams. Manage hybrid environments, including on-prem infrastructure and Kubernetes workloads supporting both Linux and Windows. Lead incident response, root cause analysis, and continuous improvement of SLIs for mission-critical systems. Drive toil reduction and automation using scripting or programming languages such as PowerShell, Bash, Python, or Go. Independently drive and implement DevOps/cloud initiatives in collaboration with key stakeholders. Understand software development designs and compilation/deployment flows for .NET, Angular, and Java-based applications to align infrastructure and CI/CD strategies with application architecture. Required Qualifications: 10+ years of experience in IT or software development, with at least 5 years in cloud architecture and DevOps roles. Strong foundational knowledge of infrastructure components such as networking, servers, operating systems, DNS, Active Directory, and LDAP. Deep expertise in AWS services including EKS, RDS, MSK, CloudFront, S3, and OpenSearch. Hands-on experience with Kubernetes, Docker, containerd, and microservices orchestration. Proficiency in Infrastructure as Code using Terraform and configuration management tools like Ansible and Chef. Experience with observability tools and telemetry pipelines (Grafana, Prometheus, Fluentd, OpenTelemetry, Splunk). Experience with agent-based monitoring tools such as SCOM and Datadog. Solid scripting skills in Python, Bash, and PowerShell. Familiarity with enterprise-grade web services (IIS, Apache, Nginx) and load balancing solutions. Excellent communication and leadership skills with experience mentoring and collaborating across teams. Preferred Qualifications: Experience with api gateway solutions for API security and management. Knowledge on RDBMS, preferably MSSQL/Postgresql is good to have. Proficiency in SRE principles including SLIs, SLOs, SLAs, error budgets, chaos engineering, and toil reduction. Experience in core software development (e.g., Java, .NET). Exposure to Azure cloud and hybrid cloud strategies. Bachelor’s degree in Computer Science or a related field Minimum Qualifications: Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 4+ years of Software Engineering or related work experience. OR Master's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Software Engineering or related work experience. OR PhD in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience. 2+ years of work experience with Programming Language such as C, C++, Java, Python, etc.
Posted 5 days ago
1.0 - 5.0 years
12 - 16 Lacs
Chennai
Work from Office
Job Area: Engineering Group, Engineering Group > Software Engineering General Summary: As a leading technology innovator, Qualcomm pushes the boundaries of what's possible to enable next-generation experiences and drives digital transformation to help create a smarter, connected future for all. As a Qualcomm Software Engineer, you will design, develop, create, modify, and validate embedded and cloud edge software, applications, and/or specialized utility programs that launch cutting-edge, world class products that meet and exceed customer needs. Qualcomm Software Engineers collaborate with systems, hardware, architecture, test engineers, and other teams to design system-level software solutions and obtain information on performance requirements and interfaces. Minimum Qualifications: Bachelor's degree in Engineering, Information Systems, Computer Science, or related field. Job Title: MLOps Engineer - ML Platform Hiring Title: Flexible based on candidate experience – about Staff Engineer preferred : We are seeking a highly skilled and experienced MLOps Engineer to join our team and contribute to the development and maintenance of our ML platform both on premises and AWS Cloud. As a MLOps Engineer, you will be responsible for architecting, deploying, and optimizing the ML & Data platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana. Your expertise in AWS services such as EKS, EC2, VPC, IAM, S3, and EFS will be crucial in ensuring the smooth operation and scalability of our ML infrastructure. You will work closely with cross-functional teams, including data scientists, software engineers, and infrastructure specialists, to ensure the smooth operation and scalability of our ML infrastructure. Your expertise in MLOps, DevOps, and knowledge of GPU clusters will be vital in enabling efficient training and deployment of ML models. Responsibilities will include: Architect, develop, and maintain the ML platform to support training and inference of ML models. Design and implement scalable and reliable infrastructure solutions for NVIDIA clusters both on premises and AWS Cloud. Collaborate with data scientists and software engineers to define requirements and ensure seamless integration of ML and Data workflows into the platform. Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment. Monitor and troubleshoot system performance, identifying and resolving issues to ensure the availability and reliability of the ML platform. Implement and maintain CI/CD pipelines for automated model training, evaluation, and deployment using technologies like ArgoCD and Argo Workflow. Implement and maintain monitoring stack using Prometheus and Grafana to ensure the health and performance of the platform. Manage AWS services including EKS, EC2, VPC, IAM, S3, and EFS to support the platform. Implement logging and monitoring solutions using AWS CloudWatch and other relevant tools. Stay updated with the latest advancements in MLOps, distributed computing, and GPU acceleration technologies, and proactively propose improvements to enhance the ML platform. What are we looking for: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. Proven experience as an MLOps Engineer or similar role, with a focus on large-scale ML and/or Data infrastructure and GPU clusters. Strong expertise in configuring and optimizing NVIDIA DGX clusters for deep learning workloads. Proficient in using the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus , and Grafana . Solid programming skills in languages like Python, Go and experience with relevant ML frameworks (e.g., TensorFlow, PyTorch ). In-depth understanding of distributed computing, parallel computing, and GPU acceleration techniques. Familiarity with containerization technologies such as Docker and orchestration tools. Experience with CI/CD pipelines and automation tools for ML workflows (e.g., Jenkins, GitHub, ArgoCD). Experience with AWS services such as EKS , EC2, VPC, IAM, S3, and EFS. Experience with AWS logging and monitoring tools. Strong problem-solving skills and the ability to troubleshoot complex technical issues. Excellent communication and collaboration skills to work effectively within a cross-functional team. We would love to see: Experience with training and deploying models. Knowledge of ML model optimization techniques and memory management on GPUs. Familiarity with ML-specific data storage and retrieval systems. Understanding of security and compliance requirements in ML infrastructure.
Posted 5 days ago
4.0 - 9.0 years
12 - 17 Lacs
Chennai
Work from Office
Job Area: Engineering Group, Engineering Group > Software Engineering General Summary: As a leading technology innovator, Qualcomm pushes the boundaries of what's possible to enable next-generation experiences and drives digital transformation to help create a smarter, connected future for all. As a Qualcomm Software Engineer, you will design, develop, create, modify, and validate embedded and cloud edge software, applications, and/or specialized utility programs that launch cutting-edge, world class products that meet and exceed customer needs. Qualcomm Software Engineers collaborate with systems, hardware, architecture, test engineers, and other teams to design system-level software solutions and obtain information on performance requirements and interfaces. Minimum Qualifications: Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 4+ years of Software Engineering or related work experience. OR Master's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Software Engineering or related work experience. OR PhD in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience. 2+ years of work experience with Programming Language such as C, C++, Java, Python, etc. Job Title: MLOps Engineer - ML Platform Hiring Title: Flexible based on candidate experience – about Staff Engineer preferred : We are seeking a highly skilled and experienced MLOps Engineer to join our team and contribute to the development and maintenance of our ML platform both on premises and AWS Cloud. As a MLOps Engineer, you will be responsible for architecting, deploying, and optimizing the ML & Data platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana. Your expertise in AWS services such as EKS, EC2, VPC, IAM, S3, and EFS will be crucial in ensuring the smooth operation and scalability of our ML infrastructure. You will work closely with cross-functional teams, including data scientists, software engineers, and infrastructure specialists, to ensure the smooth operation and scalability of our ML infrastructure. Your expertise in MLOps, DevOps, and knowledge of GPU clusters will be vital in enabling efficient training and deployment of ML models. Responsibilities will include: Architect, develop, and maintain the ML platform to support training and inference of ML models. Design and implement scalable and reliable infrastructure solutions for NVIDIA clusters both on premises and AWS Cloud. Collaborate with data scientists and software engineers to define requirements and ensure seamless integration of ML and Data workflows into the platform. Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment. Monitor and troubleshoot system performance, identifying and resolving issues to ensure the availability and reliability of the ML platform. Implement and maintain CI/CD pipelines for automated model training, evaluation, and deployment using technologies like ArgoCD and Argo Workflow. Implement and maintain monitoring stack using Prometheus and Grafana to ensure the health and performance of the platform. Manage AWS services including EKS, EC2, VPC, IAM, S3, and EFS to support the platform. Implement logging and monitoring solutions using AWS CloudWatch and other relevant tools. Stay updated with the latest advancements in MLOps, distributed computing, and GPU acceleration technologies, and proactively propose improvements to enhance the ML platform. What are we looking for: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. Proven experience as an MLOps Engineer or similar role, with a focus on large-scale ML and/or Data infrastructure and GPU clusters. Strong expertise in configuring and optimizing NVIDIA DGX clusters for deep learning workloads. Proficient in using the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus , and Grafana . Solid programming skills in languages like Python, Go and experience with relevant ML frameworks (e.g., TensorFlow, PyTorch ). In-depth understanding of distributed computing, parallel computing, and GPU acceleration techniques. Familiarity with containerization technologies such as Docker and orchestration tools. Experience with CI/CD pipelines and automation tools for ML workflows (e.g., Jenkins, GitHub, ArgoCD). Experience with AWS services such as EKS , EC2, VPC, IAM, S3, and EFS. Experience with AWS logging and monitoring tools. Strong problem-solving skills and the ability to troubleshoot complex technical issues. Excellent communication and collaboration skills to work effectively within a cross-functional team. We would love to see: Experience with training and deploying models. Knowledge of ML model optimization techniques and memory management on GPUs. Familiarity with ML-specific data storage and retrieval systems. Understanding of security and compliance requirements in ML infrastructure.
Posted 5 days ago
5.0 - 10.0 years
25 - 30 Lacs
Bengaluru
Work from Office
At Kotak Mahindra Bank, customer experience is at the forefront of everything we do on Digital Platform. To help us build & run platform for Digital Applications , we are now looking for an experienced Sr. DevOps Engineer . They will be responsible for deploying product updates, identifying production issues and implementing integrations that meet our customers' needs. If you have a solid background in software engineering and are familiar with AWS EKS, ISTIO/Services Mesh/tetrate, Terraform,Helm Charts, KONG API Gateway, Azure DevOps, SpringBoot , Ansible, Kafka/MOngoDB we’d love to speak with you. Objectives of this Role Building and setting up new development tools and infrastructure Understanding the needs of stakeholders and conveying this to developers Working on ways to automate and improve development and release processes Investigate and resolve technical issues Develop scripts to automate visualization Design procedures for system troubleshooting and maintenance Skills and Qualifications BSc in Computer Science, Engineering or relevant field Experience as a DevOps Engineer or similar software engineering role minimum 5 Yrs Proficient with git and git workflows Good knowledge of Kubernets EKS,Teraform,CICD ,AWS Problem-solving attitude Collaborative team spirit Testing and examining code written by others and analyzing results Identifying technical problems and developing software updates and ‘fixes’ Working with software developers and software engineers to ensure that development follows established processes and works as intended Monitoring the systems and setup required Tools Daily and Monthly Responsibilities Deploy updates and fixes Provide Level 3 technical support Build tools to reduce occurrences of errors and improve customer experience Develop software to integrate with internal back-end systems Perform root cause analysis for production errors
Posted 5 days ago
6.0 - 10.0 years
12 - 18 Lacs
Hyderabad, Ahmedabad
Hybrid
Job Title: Senior DevOps Site Reliability Engineer (SRE) Location: Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 6-10 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: Cloud: GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Datadog Containers & Orchestration: Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC: Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have: GCP Monitoring, Skywalking Service Mesh, API Gateway GCP Spanner, Scope: Drive operational excellence and platform resilience Reduce MTTR, increase service availability Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. Lead incident management for critical production issues drive Root Cause Analysis (RCA) and postmortems. Create and maintain runbooks and standard operating procedures for high availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated System Recovery, Alert Suppression, and Escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyze performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews.Role & responsibilities Interested candidates reach out to : Anjitha.jr@acesoftlabs.com IT Recruiter
Posted 5 days ago
6.0 - 10.0 years
12 - 18 Lacs
Hyderabad, Ahmedabad
Hybrid
Job Title: Senior DevOps Site Reliability Engineer (SRE) Location: Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 6-10 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: Cloud: GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Datadog Containers & Orchestration: Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC: Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have: GCP Monitoring, Skywalking Service Mesh, API Gateway GCP Spanner, Scope: Drive operational excellence and platform resilience Reduce MTTR, increase service availability Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. Lead incident management for critical production issues drive Root Cause Analysis (RCA) and postmortems. Create and maintain runbooks and standard operating procedures for high availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated System Recovery, Alert Suppression, and Escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyze performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews.Role & responsibilities Interested candidates reach out to nithinsai.n@acesoftlabs.com ph no : 7702051201 IT RECRUITER
Posted 5 days ago
6.0 - 10.0 years
12 - 18 Lacs
Hyderabad, Ahmedabad
Hybrid
Job Title: Senior DevOps Site Reliability Engineer (SRE) Location: Hyderabad & Ahmedabad Employment Type: Full-Time Work Model - 3 Days from office Job Overview Dynamic, motivated individuals deliver exceptional solutions for the production resiliency of the systems. The role incorporates aspects of software engineering and operations, DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. Summary: As a Senior SRE, you will ensure platform reliability, incident management, and performance optimization. You'll define SLIs/SLOs, contribute to robust observability practices, and drive proactive reliability engineering across services. Experience Required: 6-10 years of SRE or infrastructure engineering experience in cloud-native environments. Mandatory: Cloud: GCP (GKE, Load Balancing, VPN, IAM) Observability: Prometheus, Grafana, ELK, Datadog Containers & Orchestration: Kubernetes, Docker Incident Management: On-call, RCA, SLIs/SLOs IaC: Terraform, Helm Incident Tools: PagerDuty, OpsGenie Nice to Have: GCP Monitoring, Skywalking Service Mesh, API Gateway GCP Spanner, Scope: Drive operational excellence and platform resilience Reduce MTTR, increase service availability Own incident and RCA processes Roles and Responsibilities: Define and measure Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets across services. Lead incident management for critical production issues drive Root Cause Analysis (RCA) and postmortems. Create and maintain runbooks and standard operating procedures for high availability services. Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption. Coordinate cross-functional war-room sessions during major incidents and maintain response logs. Develop and improve automated System Recovery, Alert Suppression, and Escalation logic. Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture. Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems. Analyze performance metrics and conduct regular reliability reviews with engineering leads. Participate in capacity planning, failover testing, and resilience architecture reviews.Role & responsibilities Interested candidates reach out to : akram.m@acesoftlabs.com ph no : 6387195529 IT Recruiter
Posted 5 days ago
15.0 - 20.0 years
5 - 9 Lacs
Hyderabad
Work from Office
Project Role : Integration Engineer Project Role Description : Provide consultative Business and System Integration services to help clients implement effective solutions. Understand and translate customer needs into business and technology solutions. Drive discussions and consult on transformation, the customer journey, functional/application designs and ensure technology and business solutions represent business requirements. Must have skills : Infrastructure As Code (IaC) Good to have skills : Microsoft Azure SentinelMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time education Summary :As an Integration Engineer, you will provide consultative Business and System Integration services to assist clients in implementing effective solutions. Your typical day will involve engaging with clients to understand their needs, facilitating discussions on transformation, and ensuring that the technology and business solutions align with their requirements. You will work collaboratively with various teams to translate customer needs into actionable plans, driving the customer journey and application designs to achieve optimal outcomes. Roles & Responsibilities:- Expected to be an SME, collaborate and manage the team to perform.- Responsible for team decisions.- Engage with multiple teams and contribute on key decisions.- Provide solutions to problems for their immediate team and across multiple teams.- Facilitate workshops and meetings to gather requirements and feedback from stakeholders.- Develop and maintain documentation related to integration processes and solutions.- Develop, maintain, and test Hashicorp Terraform modules for infrastructure as code (IaC)- Design, implement, and manage Sentinel policies as code to enforce security and compliance standards- Collaborate with development, operations, and security teams to integrate security practices into the CI/CD pipeline- Automate infrastructure provisioning, configuration management, and application deployment processes- Monitor and troubleshoot infrastructure and application issues, ensuring high availability and performance- Conduct security assessments and audits to identify vulnerabilities and implement remediation measures- Stay up-to-date with the latest industry trends, tools, and best practices in DevSecOps, Terraform, and Sentinel Professional & Technical Skills: - Must To Have Skills: Proficiency in Infrastructure As Code (IaC).- Good To Have Skills: Experience with Microsoft Azure Sentinel.- Strong understanding of cloud infrastructure and deployment strategies.- Experience with automation tools and frameworks for infrastructure management.- Familiarity with version control systems and CI/CD pipelines.- Proven experience (min. 5 years) as a DevSecOps/Cloud Engineer or similar role- Strong expertise in Hashicorp Terraform and infrastructure as code (IaC) principles- Proficiency in developing and managing Sentinel policies as code- Experience with CI/CD tools such as GitHub, GitHub Actions, Jenkins, and JFrog Platform- Solid understanding of cloud platforms, specifically Google Cloud Platform (GCP) and Microsoft Azure- Knowledge of containerization technologies (Docker, Kubernetes) and orchestration- Familiarity with security frameworks and compliance standards (e.g., NIST, ISO 27001)- Certifications in Terraform, GCP, or Azure (e.g., HashiCorp Certified:Terraform Associate, Google Cloud Professional Cloud Architect, Microsoft Certified:Azure Solutions Architect Expert).- Experience with scripting languages (Python, Bash, PowerShell).- Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack). Additional Information:- The candidate should have minimum 5 years of experience in Infrastructure As Code (IaC).- This position is based at our Hyderabad office.- A 15 years full time education is required. Qualification 15 years full time education
Posted 5 days ago
5.0 - 10.0 years
3 - 7 Lacs
Bengaluru
Work from Office
Project Role : Application Support Engineer Project Role Description : Act as software detectives, provide a dynamic service identifying and solving issues within multiple components of critical business systems. Must have skills : Kubernetes Good to have skills : Google Kubernetes Engine, Google Cloud Compute ServicesMinimum 7.5 year(s) of experience is required Educational Qualification : 15 years full time education:We are looking for an experienced Kubernetes Architect to join our growing cloud infrastructure team. This role will be responsible for architecting, designing, and implementing scalable, secure, and highly available cloud-native applications on Kubernetes. You will leverage Kubernetes along with associated technologies like Kubekafka, Kubegres, Helm, Ingress, Redis, Grafana, and Prometheus to build resilient systems that meet both business and technical needs. Google Kubernetes Engine (GKE) will be considered as an additional skill. As a Kubernetes Architect, you will play a key role in defining best practices, optimizing the infrastructure, and providing architectural guidance to cross-functional teams.Key Responsibilities:Architect Kubernetes Solutions:Design and implement scalable, secure, and high-performance Kubernetes clusters.Cloud-Native Application Design:Collaborate with development teams to design cloud-native applications, ensuring that microservices are properly architected and optimized for Kubernetes environments.Kafka Management:Architect and manage Apache Kafka clusters using Kubekafka, ensuring reliable, real-time data streaming and event-driven architectures.Database Architecture:Use Kubegres to manage high-availability PostgreSQL clusters in Kubernetes, ensuring data consistency, scaling, and automated failover.Helm Chart Development:Create, maintain, and optimize Helm charts for consistent deployment and management of applications across Kubernetes environments.Ingress & Networking:Architect and configure Ingress controllers (e.g., NGINX, Traefik) for secure and efficient external access to Kubernetes services, including SSL termination, load balancing, and routing.Caching and Performance Optimization:Leverage Redis to design efficient caching and session management solutions, optimizing application performance.Monitoring & Observability:Lead the implementation of Prometheus for metrics collection and Grafana for building real-time monitoring dashboards to visualize the health and performance of infrastructure and applications.CI/CD Integration:Design and implement continuous integration and continuous deployment (CI/CD) pipelines to streamline the deployment of Kubernetes-based applications.Security & Compliance:Ensure Kubernetes clusters follow security best practices, including RBAC, network policies, and the proper configuration of Secrets Management.Automation & Scripting:Develop automation frameworks using tools like Terraform, Helm, and Ansible to ensure repeatable and scalable deployments.Capacity Planning and Cost Optimization:Optimize resource usage within Kubernetes clusters to achieve both performance and cost-efficiency, utilizing cloud tools and services.Leadership & Mentorship:Provide technical leadership to development, operations, and DevOps teams, offering mentorship, architectural guidance, and sharing best practices.Documentation & Reporting:Produce comprehensive architecture diagrams, design documents, and operational playbooks to ensure knowledge transfer across teams and maintain system reliability.Required Skills & Experience:10+ years of experience in cloud infrastructure engineering, with at least 5+ years of hands-on experience with Kubernetes.Strong expertise in Kubernetes for managing containerized applications in the cloud. Experience in deploying & managing container-based systems on both private and public clouds (Google Kubernetes Engine (GKE)).Proven experience with Kubekafka for managing Apache Kafka clusters in Kubernetes environments.Expertise in managing PostgreSQL clusters with Kubegres and implementing high-availability database solutions.In-depth knowledge of Helm for managing Kubernetes applications, including the development of custom Helm charts.Experience with Ingress controllers (e.g., NGINX, Traefik) for managing external traffic in Kubernetes.Hands-on experience with Redis for caching, session management, and as a message broker in Kubernetes environments.Advanced knowledge of Prometheus for monitoring and Grafana for visualization and alerting in cloud-native environments.Experience with CI/CD pipelines for automated deployment and integration using tools like Jenkins, GitLab CI, or CircleCI.Solid understanding of networking, including load balancing, DNS, SSL/TLS, and ingress/egress configurations in Kubernetes.Familiarity with Terraform and Ansible for infrastructure automation.Deep understanding of security best practices in Kubernetes, such as RBAC, Network Policies, and Secrets Management.Knowledge of DevSecOps practices to ensure secure application delivery.Certifications:oGoogle Cloud Platform (GCP) certification is mandatory.oKubernetes Certification (CKA, CKAD, or CKAD) is highly preferred.oHashiCorp Terraform certification is a significant plus. Qualification 15 years full time education
Posted 5 days ago
3.0 - 8.0 years
3 - 7 Lacs
Bengaluru
Work from Office
Project Role : Application Support Engineer Project Role Description : Act as software detectives, provide a dynamic service identifying and solving issues within multiple components of critical business systems. Must have skills : Kubernetes Good to have skills : Google Kubernetes Engine, Google Cloud Compute ServicesMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time educationJob Summary :We are looking for an experienced Kubernetes Specialist to join our cloud infrastructure team. You will work closely with architects and engineers to design, implement, and optimize cloud-native applications on Google Kubernetes Engine (GKE). This role will focus on providing expertise in Kubernetes, container orchestration, and cloud infrastructure management, ensuring the seamless operation of scalable, secure, and high-performance applications on GKE and other cloud environments.Responsibilities:Kubernetes Implementation:Design, implement, and manage Kubernetes clusters for containerized applications, ensuring high availability and scalability.Cloud-Native Application Design:Work with teams to deploy, scale, and maintain cloud-native applications on Google Kubernetes Engine (GKE).Kubernetes Tools Expertise:Utilize Kubekafka, Kubegres, Helm, Ingress, Redis, Grafana, and Prometheus to build and maintain resilient systems.Infrastructure Automation:Develop and implement automation frameworks using Terraform and other tools to streamline Kubernetes deployments and cloud infrastructure management.CI/CD Implementation:Design and maintain CI/CD pipelines to automate deployment and testing for Kubernetes-based applications.Kubernetes Networking & Security:Ensure secure and efficient Kubernetes cluster networking, including Ingress controllers (e.g., NGINX, Traefik), RBAC, and Secrets Management.Monitoring & Observability:Lead the integration of monitoring solutions using Prometheus for metrics and Grafana for real-time dashboard visualization.Performance Optimization:Optimize resource utilization within GKE clusters, ensuring both performance and cost-efficiency.Collaboration:Collaborate with internal development, operations, and security teams to meet user requirements and implement Kubernetes solutions.Troubleshooting & Issue Resolution:Address complex issues related to containerized applications, Kubernetes clusters, and cloud infrastructure, troubleshooting and resolving them efficiently.Technical Skillset:GCP & Kubernetes Experience:Minimum of 3+ years of hands-on experience in Google Cloud Platform (GCP) and Kubernetes implementations, including GKE.Container Management:Proficiency with container orchestration engines such as Kubernetes and Docker.Kubernetes Tools Knowledge:Experience with Kubekafka, Kubegres, Helm, Ingress, Redis, Grafana, and Prometheus for managing Kubernetes-based applications.Infrastructure as Code (IaC):Strong experience with Terraform for automating infrastructure provisioning and management.CI/CD Pipelines:Hands-on experience in building and managing CI/CD pipelines for Kubernetes applications using tools like Jenkins, GitLab, or CircleCI.Security & Networking:Knowledge of Kubernetes networking (DNS, SSL/TLS), security best practices (RBAC, network policies, and Secrets Management), and the use of Ingress controllers (e.g., NGINX)Cloud & DevOps Tools:Familiarity with cloud services and DevOps tools such as GitHub, Jenkins, and Ansible.Monitoring Expertise:In-depth experience with Prometheus and Grafana for operational monitoring, alerting, and creating actionable insights.Certifications:Google Cloud Platform (GCP) Associate Cloud Engineer (ACE) certification is required.Certified Kubernetes Administrator (CKA) is highly preferred. Qualification 15 years full time education
Posted 5 days ago
3.0 - 8.0 years
4 - 8 Lacs
Bengaluru
Work from Office
Project Role : Software Development Engineer Project Role Description : Analyze, design, code and test multiple components of application code across one or more clients. Perform maintenance, enhancements and/or development work. Must have skills : Python (Programming Language) Good to have skills : NAMinimum 5 year(s) of experience is required Educational Qualification : 15 years full time education Summary :This is a hands-on, technical role where the candidate will design and implement a DevOps Maturity Model by integrating multiple DevOps tools and building backend APIs to visualize data on a front-end interface. The candidate will work closely with cross-functional teams to enable DevOps culture, ensure system reliability, and drive continuous improvement. Roles & Responsibilities:1.DevOps Maturity Model:Design and develop a model to assess and improve DevOps practices by integrating tools like Jenkins, GitLab, and Azure DevOps.2.Backend Development:Build scalable and efficient backend APIs using Python and Azure Serverless.3.Frontend Development:Develop intuitive and responsive front-end interfaces using Angular and Vue.js for data visualization.4.Monitoring & Automation:Implement monitoring, logging, and alerting solutions. Develop automation scripts for reporting and analysis.5.Collaboration:Work with cross-functional teams to resolve production-level disruptions and enable DevOps culture.6.Documentation:Document architecture, design, and implementation details. Professional & Technical Skills: 1.Backend Development :Python and experience with Azure Serverless2.Frontend DevelopmentAngular and Vue.js.3.Databases:Familiarity with Azure SQL, Cosmos DB, or PostgreSQL.4.Containerization:Good understanding of Docker and Kubernetes for basic troubleshooting.5.Networking:Basic understanding of TCP/IP, HTTP, DNS, VPN, and cloud networking.6.Monitoring & Logging:Experience with monitoring tools like Prometheus, Grafana, or Datadog. Additional Information:1.The candidate should have a minimum of 3 years of experience in Python & Angular full stack.2.This position is based at our Bengaluru office.3.A 15 years full time education is required (bachelors degree in computer science, Engineering, or a related field). Qualification 15 years full time education
Posted 5 days ago
15.0 - 20.0 years
5 - 10 Lacs
Hyderabad
Work from Office
Project Role : DevOps Engineer Project Role Description : Responsible for building and setting up new development tools and infrastructure utilizing knowledge in continuous integration, delivery, and deployment (CI/CD), Cloud technologies, Container Orchestration and Security. Build and test end-to-end CI/CD pipelines, ensuring that systems are safe against security threats. Must have skills : DevSecOps Good to have skills : Google Cloud Platform Architecture, Microsoft Azure Infrastructure as Code (IaC)Minimum 12 year(s) of experience is required Educational Qualification : 15 years full time education Summary :As a DevOps Engineer, you will be responsible for building and setting up new development tools and infrastructure. A typical day involves utilizing your knowledge in continuous integration, delivery, and deployment, as well as cloud technologies and container orchestration. You will also focus on ensuring that systems are secure against potential threats while collaborating with various teams to enhance the development process and improve overall efficiency. Roles & Responsibilities:- Expected to be an SME.- Collaborate and manage the team to perform.- Responsible for team decisions.- Engage with multiple teams and contribute on key decisions.- Provide solutions to problems for their immediate team and across multiple teams.- Facilitate knowledge sharing sessions to enhance team capabilities.- Monitor and optimize CI/CD pipelines for performance and security.- Oversee the development, maintenance, and testing of Hashicorp Terraform modules for infrastructure as code (IaC)- Ensure the design, implementation, and management of Sentinel policies as code to enforce security and compliance standards- Collaborate with cross-functional teams to integrate security practices into the CI/CD pipeline- Drive the automation of infrastructure provisioning, configuration management, and application deployment processes- Monitor and troubleshoot infrastructure and application issues, ensuring high availability and performance- Conduct regular security assessments and audits to identify vulnerabilities and implement remediation measures- Stay up to date with the latest industry trends, tools, and best practices in DevSecOps, Terraform, and Sentinel- Foster a culture of continuous improvement, innovation, and collaboration within the team- Develop and implement strategies to enhance the team's efficiency, productivity, and overall performance- Report on team progress, challenges, and achievements to senior management Professional & Technical Skills: - Must To Have Skills: Proficiency in DevSecOps.- Good To Have Skills: Experience with Google Cloud Platform Architecture, Microsoft Azure Infrastructure as Code (IaC).- Strong understanding of continuous integration and continuous deployment methodologies.- Experience with container orchestration tools such as Kubernetes or Docker Swarm.- Familiarity with security best practices in software development and deployment.- Proven experience in a leadership role within a DevSecOps or similar environment- Strong expertise in Hashicorp Terraform and infrastructure as code (IaC) principles- Proficiency in developing and managing Sentinel policies as code- Experience with CI/CD tools such as GitHub, GitHub Actions, Jenkins, and JFrog Platform- Solid understanding of cloud platforms, specifically Google Cloud Platform (GCP) and Microsoft Azure- Knowledge of containerization technologies (Docker, Kubernetes) and orchestration.- Familiarity with security frameworks and compliance standards (e.g., NIST, ISO 27001).- Certifications in Terraform, GCP, or Azure (e.g., HashiCorp Certified:Terraform Associate, Google Cloud Professional Cloud Architect, Microsoft Certified:Azure Solutions Architect Expert).- Experience with scripting languages (Python, Bash, PowerShell).- Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack). Additional Information:- The candidate should have minimum 7.5 years of experience in DevSecOps.- This position is based at our Hyderabad office.- A 15 years full time education is required. Qualification 15 years full time education
Posted 5 days ago
4.0 - 9.0 years
15 - 30 Lacs
Chennai
Hybrid
ACV Auctions is looking for an experienced Site Reliability Engineer III with a systems and software engineering background to focus on site reliability. We believe in taking a software engineers approach to operations by providing standards and software tools to all engineering projects. As a Site Reliability Engineer, you will split your time between developing software that improves overall reliability and providing operational support for production systems. What you will do: Maintain reliability and performance for your particular infrastructure area while working with software engineers to improve service quality and health. Develop, design, and review new software tools in Python & Java to improve infrastructure reliability and provide services with better monitoring, automation, and product delivery. Practice efficient incident response through on-call rotations alongside software engineers and document incidents through postmortems. Support service development with capacity plans, launch/deployment plans, scalable system design, and monitoring plans. What you will need: BS degree in Computer Science or a related technical discipline or equivalent practical experience. Experience building/managing infrastructure deployments on Google Cloud Platform 3+ years managing cloud infrastructure. Experience programming in at least one of the following: Python or Java You are experienced in Linux/Unix systems administration, configuration management, monitoring, and troubleshooting. You are comfortable with production systems including load balancing, distributed systems, microservice architecture, service meshes, and continuous delivery. Experience building and delivering software tools for monitoring, management, and automation that support production systems. Comfortable working with teams across multiple time -zones and working flexible hours as needed. Preferred Qualifications Experience maintaining and scaling Kubernetes clusters for production workloads is a plus
Posted 5 days ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
20312 Jobs | Dublin
Wipro
11977 Jobs | Bengaluru
EY
8165 Jobs | London
Accenture in India
6667 Jobs | Dublin 2
Uplers
6464 Jobs | Ahmedabad
Amazon
6352 Jobs | Seattle,WA
Oracle
5993 Jobs | Redwood City
IBM
5803 Jobs | Armonk
Capgemini
3897 Jobs | Paris,France
Tata Consultancy Services
3776 Jobs | Thane