Home
Jobs

289 Opentelemetry Jobs - Page 10

Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
Filter
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 years

0 Lacs

Pune, Maharashtra, India

Remote

Linkedin logo

Job Description ABOUT THIS ROLE As an SRE, your primary responsibility is to ensure the reliability, scalability, and availability of the systems that power Kibo’s products and services. You will work closely with cross-functional teams to build and maintain these systems, and you will be responsible for monitoring them to proactively identify and address production issues. ABOUT KIBO KIBO is a composable digital commerce platform for B2C, D2C, and B2B organizations who want to simplify the complexity in their businesses and deliver modern customer experiences. KIBO is the only modular, modern commerce platform that supports experiences spanning B2B and B2C Commerce, Order Management, and Subscriptions. Companies like Ace Hardware, Zwilling, Jelly Belly, Nivel, and Honey Birdette trust Kibo to bring simplicity and sophistication to commerce operations and deliver experiences that drive value. KIBO's cutting-edge solution is MACH Alliance Certified and has been recognized by Forrester, Gartner, IDC, Internet Retailer, and TrustRadius. KIBO has been named a leader in The Forrester Wave™: Order Management Systems, Q1 2025 and in the IDC MarketScape report “Worldwide Enterprise Headless Digital Commerce Applications 2024 Vendor Assessment”. By joining KIBO, you will be part of a team of Kibonauts all over the world in a remote-friendly environment. Whether your job is to build, sell, or support KIBO’s commerce solutions, we tackle challenges together with the approach of trust, growth mindset, and customer obsession. If you’re seeking a unique challenge with amazing growth potential, then come work with us! WHAT YOU’LL DO Design, implement, and maintain cloud infrastructure and tooling to support software development, deployment, and operations. Develop and enhance monitoring and alerting systems to proactively detect and resolve issues, ensuring system reliability. Automate deployments, configurations, and testing to streamline administration and minimize operational risks. Troubleshoot and resolve performance, availability, and security issues across distributed systems. Lead post-mortems and root cause analyses to drive continuous improvement and prevent recurring incidents. Ensure high availability and system reliability while participating in a 24x7x365 on-call rotation to address critical incidents. Collaborate with engineering teams to build scalable, resilient, and secure infrastructure that meets customer needs. Requirements WHAT YOU’LL NEED 6+ years of experience in an SRE, DevOps, or cloud engineering role. Strong fundamentals in Linux, networking, distributed systems, and cloud architecture. Experience with cloud platforms (AWS and/or GCP preferred; Azure is a plus). Proficiency with Kubernetes and related tools (Flux, Helm, Argo CD, Keel). Expertise in Infrastructure as Code (Terraform preferred) and configuration management (Ansible preferred). Experience with monitoring and observability tools such as Elasticsearch, Prometheus, Grafana, and OpenTelemetry. Scripting skills in Python, Bash, or Go (or similar languages). Deep understanding of security best practices and ability to implement them across cloud infrastructure. Experience operating in a SOC 2, PCI-DSS, and/or ISO 27001 compliant environment is a plus. Strong problem-solving mindset with a proactive approach to reliability engineering. Excellent communication and collaboration skills in a remote team environment. Willingness to participate in a 24x7 on-call rotation to ensure uptime and rapid incident response. KIBO PERKS Flexible schedule and hybrid work setting Paid company holidays and global volunteer holiday Generous health, wellness, benefits, and time away programs Commitment to individual growth and development and opportunity for internal mobility Passionate, high-achieving teammates excited to help you succeed and learn Company-sponsored events and other activities At Kibo we celebrate and support all differences. Kibo is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital, disability, and veteran status. Show more Show less

Posted 4 weeks ago

Apply

5.0 years

0 Lacs

Gurugram, Haryana

On-site

Indeed logo

Job Information Date Opened 05/30/2025 Job Type Full time Industry Financial Services Work Experience 5+ years City Gurgaon State/Province Haryana Country India Zip/Postal Code 122002 About Us indiagold has built a product & technology platform that enables regulated entities to launch or grow their asset backed products across geographies; without investing in operations, technology, people or taking any valuation, storage or transit risks. Our use of deep-tech is changing how asset backed loans have been done traditionally. Some examples of our innovation are – lending against digital gold, 100% paperless/digital loan onboarding process, computer vision to test gold purity as opposed to manual testing, auto- scheduling of feet-on-street, customer self-onboarding, gold locker model to expand TAM & launch zero-touch gold loans, zero network business app & many more. We are rapidly growing team passionate about solving massive challenges around financial well-being. We are a rapidly growing organisation with empowered opportunities across Sales, Business Development, Partnerships, Sales Operations, Credit, Pricing, Customer Service, Business Product, Design, Product, Engineering, People & Finance across several cities. We value the right aptitude & attitude than past experience in a related role, so feel free to reach out if you believe we can be good for each other. Job Description About the Role We are seeking a Staff Software Engineer to lead and mentor engineering teams while driving the architecture and development of robust, scalable backend systems and cloud infrastructure. This is a senior hands-on role with a strong focus on technical leadership, system design, and cross-functional collaboration across development, DevOps, and platform teams. Key Responsibilities Mentor engineering teams to uphold high coding standards and best practices in backend and full-stack development using Java, Spring Boot, Node.js, Python, and React. Guide architectural decisions to ensure performance, scalability, and reliability of systems. Architect and optimize relational data models and queries using MySQL. Define and evolve cloud infrastructure using Infrastructure as Code (Terraform) across AWS or GCP. Lead DevOps teams in building and managing CI/CD pipelines, Kubernetes clusters, and related cloud-native tooling. Drive best practices in observability using tools like Grafana, Prometheus, OpenTelemetry, and centralized logging frameworks (e.g., ELK, CloudWatch, Stackdriver). Provide architectural leadership for microservices-based systems deployed via Kubernetes, including tools like ArgoCD for GitOps-based deployment strategies. Design and implement event-driven systems that are reliable, scalable, and easy to maintain. Own security and compliance responsibilities in cloud-native environments, ensuring alignment with frameworks such as ISO 27001, CISA, and CICRA. Ensure robust design and troubleshooting of container and Kubernetes networking, including service discovery, ingress, and inter-service communication. Collaborate with product and platform teams to define long-term technical strategies and implementation plans. Perform code reviews, lead technical design discussions, and contribute to engineering-wide initiatives. Requirements Required Qualifications 7+ years of software engineering experience with a focus on backend development and system architecture. Deep expertise in Java and Spring Boot, with strong working knowledge of Node.js, Python, and React.js. Proficiency in MySQL and experience designing complex relational databases. Hands-on experience with Terraform and managing infrastructure across AWS or GCP. Strong understanding of containerization, Kubernetes, and CI/CD pipelines. Solid grasp of container and Kubernetes networking principles and troubleshooting techniques. Experience with GitOps tools such as ArgoCD and other Kubernetes ecosystem components. Deep knowledge of observability practices, including metrics, logging, and distributed tracing. Experience designing and implementing event-driven architectures using modern tooling (e.g., Kafka, Pub/Sub, etc.). Demonstrated experience in owning and implementing security and compliance measures, with practical exposure to standards like ISO 27001, CISA, and CICRA. Excellent communication skills and a proven ability to lead cross-functional technical efforts. Preferred (Optional) Qualifications Contributions to open-source projects or technical blogs. Experience leading or supporting compliance audits such as ISO 27001, SOC 2, or similar. Exposure to service mesh technologies (e.g., Istio, Linkerd). Experience with policy enforcement in Kubernetes (e.g., OPA/Gatekeeper, Kyverno). Benefits Why Join Us? Lead impactful engineering initiatives and mentor talented developers. Work with a modern, cloud-native stack across AWS, GCP, Kubernetes, and Terraform. Contribute to architectural evolution and long-term technical strategy. Competitive compensation, benefits, and flexible work options. Inclusive and collaborative engineering culture.

Posted 4 weeks ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

Remote

Linkedin logo

About the Company: Transnational AI Private Limited is a next-generation AI-first company committed to building scalable, intelligent systems for digital marketplaces, insurance, employment, and healthcare sectors. We drive innovation through AI engineering, data science, and seamless platform integration powered by event-driven architectures. Role Summary: We are looking for a highly motivated AI Engineer with strong experience in Python, FastAPI, and event-driven microservice architecture. You will be instrumental in building intelligent, real-time systems that power scalable AI workflows across our platforms. This role combines deep technical engineering skills with a product-oriented mindset. Key Responsibilities: Architect and develop AI microservices using Python and FastAPI within an event-driven ecosystem. Implement and maintain asynchronous communication between services using message brokers like Kafka, RabbitMQ, or NATS. Convert AI/ML models into production-grade, containerized services integrated with streaming and event-processing pipelines. Design and document async REST APIs and event-based endpoints with comprehensive OpenAPI/Swagger documentation. Collaborate with AI researchers, product managers, and DevOps engineers to deploy scalable and secure services. Develop reusable libraries, automation scripts, and shared components for AI/ML pipelines. Maintain high standards for code quality, testability, and observability using unit tests, logging, and monitoring tools. Work within Agile teams to ship features iteratively with a focus on scalability, resilience, and fault tolerance. Required Skills and Experience: Proficiency in Python 3.x with a solid understanding of asynchronous programming (async/await). Hands-on experience with FastAPI; knowledge of Flask or Django is a plus. Experience building and integrating event-driven systems using Kafka, RabbitMQ, Redis Streams, or similar technologies. Strong knowledge of event-driven microservices, pub/sub models, and real-time data streaming architectures. Exposure to deploying AI/ML models using PyTorch, TensorFlow, or scikit-learn. Familiarity with containerization (Docker), orchestration (Kubernetes), and cloud platforms (AWS, GCP, Azure). Experience with unit testing frameworks such as PyTest, and observability tools like Prometheus, Grafana, or OpenTelemetry. Understanding of security principles including JWT, OAuth2, and API security best practices. Nice to Have: Experience with MLOps pipelines and tools like MLflow, DVC, or Kubeflow. Familiarity with Protobuf, gRPC, and async I/O with WebSockets. Prior work in real-time analytics, recommendation systems, or workflow orchestration (e.g., Prefect, Airflow). Contributions to open-source projects or active GitHub/portfolio. Educational Background: Bachelor’s or Master’s degree in Computer Science, Software Engineering, Artificial Intelligence, or a related technical discipline. Why Join Transnational AI: Build production-grade AI infrastructure powering real-world applications. Collaborate with domain experts and top engineers across marketplaces, insurance, and Workforce platforms. Flexible, remote-friendly environment with a focus on innovation and ownership. Competitive compensation, bonuses, and continuous learning support. Work on high-impact projects that influence how people discover jobs, get insured, and access personalized digital services. Show more Show less

Posted 4 weeks ago

Apply

13.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

The SRE Observability Lead Engineer is a hands-on leader responsible for shaping and delivering the future of Observability across Services Technology. This role reports into the Head of SRE Services and sits within a small central enablement team. You will define the long-term vision, build and scale modern observability capabilities across business lines, and lead a small team of SREs delivering reusable observability services. This is a blended leadership and engineering role – the ideal candidate pairs strategic vision with the technical depth to resolve real-world telemetry challenges across on-prem, cloud, and container-based environments (ECS, Kubernetes, etc.). You’ll work closely with architecture & other engineering functions to not only resolve common challenges affecting SREs aligned to LoBs, but will ensure observability is embedded as a non-functional requirement (NFR) for all new services going live. You will collaborate with platform and infrastructure teams to ensure enterprise-scale, not siloed solutions. You will also be responsible for managing a small, high-impact team of SREs based in your region. This role requires a comprehensive understanding of observability challenges across Services (Payments, Securities Services, Trade, Digital & Data) and the ability to influence outcomes at the enterprise level. Strong commercial awareness, technical credibility, and excellent communication skills are essential to negotiate internally, influence peers, and drive change. Some external communication may be necessary. Responsibilities: Define and own the strategic vision and multi-year roadmap for Observability across Services Technology, aligned with enterprise reliability and production goals. Translate strategy into an actionable delivery plan in partnership with Services Architecture & Engineering function, delivering incremental, high-value milestones toward a unified, scalable observability architecture. Lead and mentor SREs across Services, fostering a technical growth and SRE mindset. Build and offer a suite of central observability services across LoBs – including standardized telemetry libraries, onboarding templates, dashboard packs, and alerting standards. Drive reusability and efficiency by creating common patterns and golden paths for observability adoption across critical client flows and platforms. Partner with infrastructure, CTO and other SMBF tooling teams, to ensure observability tooling is scalable, resilient, and avoids duplication (“cottage industries”). Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments. Collaborate closely with the architecture function to support implementation of observability NFRs in the SDLC, ensuring new apps go live with sufficient coverage and insight. Support SRE Communities of Practice (CoP) and foster strong relationships with SREs, developers, and platform leads across Services and beyond to accelerate adoption & promote SRE best practices like SLO adoption, Capacity Planning. Use Jira/Agile workflows to track and report on observability maturity across Services LoBs – coverage, adoption, and contribution to improved client experience. Remove inefficiencies and provide solutions to enable unified views of consolidated SLOs for critical E2E client journeys for Payments & other Services critical user journeys. Influence and align senior stakeholders across functions (applications, infrastructure, controls, and audit) to drive observability investment for critical client flows across Services. Represent Services in working groups to influence enterprise observability standards, ensuring feedback from Services is reflected. Lead people management responsibilities for your direct team, including management of headcount, goal setting, performance evaluation, compensation, and hiring. Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behaviour, conduct and business practices, and escalating, managing and reporting control issues with transparency, as well as effectively supervise the activity of others and create accountability with those who fail to maintain these standards. Qualifications: 13+ years of experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including several years in senior leadership roles. Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms. Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, GCP, Azure), and container platforms (ECS, Kubernetes). Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems. Experience leading teams and managing people across geographically distributed locations. Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale. Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations. Strong collaboration skills and experience working across federated teams, building consensus and delivering change. Ability to stay up to date with industry trends and apply them to improve internal tooling and design decisions. Excellent written and verbal communication skills; able to influence and articulate complex concepts to technical and non-technical audiences. Education : Bachelor’s or Master’s degree in Computer Science, Engineering, Information Systems, or a related technical field. ------------------------------------------------------ Job Family Group: Technology ------------------------------------------------------ Job Family: Applications Support ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster. Show more Show less

Posted 1 month ago

Apply

10.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

The SRE Observability Specialist is a hands-on expert, delivering the future of Observability across Services Technology. This role is a part of a central SRE enablement team within Services Production, working closely with SREs, developers, and platform teams to embed telemetry, implement SLOs, and build meaningful visualizations for key production flows — particularly in critical Payments Business. The ideal candidate will have deep technical knowledge, a collaborative mindset, and the ability to translate strategy into scalable engineering outcomes. You will also act as a bridge between Services Technology teams and central infrastructure/CTO teams, prioritising observability needs from line-of-business teams and driving improvements. A strong understanding of observability tooling, evolving AI/ML capabilities, and enterprise tooling ecosystems will be essential. Key Responsibilities: Deliver against the observability roadmap for Services Technology by building scalable, reusable telemetry solutions. Create and maintain dashboards and visualizations for critical client journeys, including real-time flows across Payments. Guide line-of-business teams in implementing SLIs/SLOs, golden signals, and effective alerting to support operational excellence. Support integration and adoption of observability tooling across on-prem, public cloud (AWS/GCP), and containerized environments (ECS, Kubernetes). Customize shared dashboards and observability components in partnership with CTI and other central Engineering functions, ensuring usability and flexibility. Provide technical support and implementation guidance to SREs and developers facing integration or tooling challenges. Effectively manage the observability book of work for Services Technology and drive initiatives to reduce MTTD and improve recovery outcomes. Serve as a key connection point between line-of-business SREs and central infrastructure functions by gathering tooling feedback, surfacing systemic issues, and influencing platform enhancements via the Services Observability Forum. Stay current with observability trends, including AI/ML-driven insights, anomaly detection, and emerging OSS practices, and assess their applicability. Maintain strong knowledge of observability platform features and vendor offerings to advise teams and maximize the value of tooling investments. Qualifications: 10+ years of experience in SRE, Observability Engineering, or platform infrastructure roles focused on operational telemetry. Hands-on experience in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms. Deep understanding of SLIs, SLOs, Error Budgets, and telemetry best practices in high-availability environments. Proven ability to troubleshoot integration issues and support observability across hybrid platforms (on-prem, cloud, containers). Experience building dashboards aligned to business outcomes and incident workflows, especially in critical flows like payments. Familiarity with modern observability tooling ecosystems, including AI/ML capabilities, trace correlation, baselining, and alert tuning. Strong interpersonal and collaboration skills — able to operate across federated engineering teams and central infrastructure groups. Experience in enablement or platform teams with a track record of scaling best practices across diverse business units. Education: Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. ------------------------------------------------------ Job Family Group: Technology ------------------------------------------------------ Job Family: Applications Support ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster. Show more Show less

Posted 1 month ago

Apply

2.0 years

0 Lacs

India

Remote

Linkedin logo

This isn't your typical DevOps role. This is your chance to engineer the backbone of a next-gen AI-powered SaaS platform —where modular agents drive dynamic UI experiences, all running on a serverless AWS infrastructure with a Salesforce and SaaS-native backend. We're not building features—we're building an intelligent agentic ecosystem . If you've led complex multi-cloud builds, automated CI/CD pipelines with Terraform, and debugged AI systems in production, this is your arena. About Us We're a forward-thinking organization on a mission to reshape how businesses leverage cloud technologies and AI. Our approach is centered around delivering high-impact solutions that unify platforms across AWS, enterprise SaaS, and Salesforce. We don't just deliver software; we craft robust product ecosystems that redefine user interactions, streamline processes, and accelerate growth for our clients. The Role We are seeking a hands-on Agentic AI Ops Engineer who thrives at the intersection of cloud infrastructure , AI agent systems , and DevOps automation . In this role, you will build and maintain the CI/CD infrastructure for Agentic AI solutions using Terraform on AWS , while also developing, deploying, and debugging intelligent agents and their associated tools . This position is critical to ensuring scalable, traceable, and cost-effective delivery of agentic systems in production environments. The Responsibilities CI/CD Infrastructure for Agentic AI Design, implement, and maintain CI/CD pipelines for Agentic AI applications using Terraform , AWS CodePipeline , CodeBuild , and related tools. Automate deployment of multi-agent systems and associated tooling, ensuring version control, rollback strategies, and consistent environment parity across dev/test/prod Agent Development & Debugging Collaborate with ML/NLP engineers to develop and deploy modular, tool-integrated AI agents in production. Lead the effort to create debuggable agent architectures , with structured logging, standardized agent behaviors, and feedback integration loops. Build agent lifecycle management tools that support quick iteration, rollback, and debugging of faulty behaviors Monitoring, Tracing & Reliability Implement end-to-end observability for agents and tools, including runtime performance metrics , tool invocation traces , and latency/accuracy tracking . Design dashboards and alerting mechanisms to capture agent failures, degraded performance, and tool bottlenecks in real-time. Build lightweight tracing systems that help visualize agent workflows and simplify root cause analysis Cost Optimization & Usage Analysis Monitor and manage cost metrics associated with agentic operations including API call usage , toolchain overhead , and model inference costs . Set up proactive alerts for usage anomalies , implement cost dashboards , and propose strategies for reducing operational expenses without compromising performance Collaboration & Continuous Improvement Work closely with product, backend, and AI teams to evolve the agentic infrastructure design and tool orchestration workflows . Drive the adoption of best practices for Agentic AI DevOps , including retraining automation, secure deployments, and compliance in cloud-hosted environments. Participate in design reviews, postmortems, and architectural roadmap planning to continuously improve reliability and scalability Requirements 2+ years of experience in DevOps, MLOps, or Cloud Infrastructure with exposure to AI/ML systems . Deep expertise in AWS serverless architecture , including hands-on experience with: AWS Lambda - function design, performance tuning, cold-start optimization. Amazon API Gateway - managing REST/HTTP APIs and integrating with Lambda securely. Step Functions - orchestrating agentic workflows and managing execution states. S3, DynamoDB, EventBridge, SQS - event-driven and storage patterns for scalable AI systems. Strong proficiency in Terraform to build and manage serverless AWS environments using reusable, modular templates Experience deploying and managing CI/CD pipelines for serverless and agent-based applications using AWS CodePipeline, CodeBuild, CodeDeploy , or GitHub Actions Hands-on experience with agent and tool development in Python , including debugging and performance tuning in production. Solid understanding of IAM roles and policies , VPC configuration, and least-privilege access control for securing AI systems. Deep understanding of monitoring, alerting, and distributed tracing systems (e.g., CloudWatch, Grafana, OpenTelemetry). Ability to manage environment parity across dev, staging, and production using automated infrastructure pipelines. Excellent debugging, documentation, and cross-team communication skills Benefits Health Insurance, PTO, and Leave time Ongoing paid professional training and certifications Fully Remote work Opportunity Strong Onboarding & Training program Work Timings - 1pm -10 pm IST Next Steps We're looking for someone who already embodies the spirit of a boundary-breaking AI Technologist—someone who's ready to own ambitious projects and push the boundaries of what LLMs can do. Apply Now : Send us your resume and answer a few key questions about your experience and vision Show Us Your Ingenuity : Be prepared to talk shop on your boldest AI solutions and how you overcame the toughest technical hurdles Collaborate & Ideate : If selected, you'll workshop a real-world scenario with our team—so we can see firsthand how your mind works This is your chance to leave a mark on the future of AI—one LLM agent at a time. We're excited to hear from you! Our Belief We believe extraordinary things happen when technology and human creativity unite. By empowering teams with generative AI, we free them to focus on meaningful relationships, innovative solutions, and real impact. It's more than just code—it's about sparking a revolution in how people interact with information, solve problems, and propel businesses forward. If this resonates with you—if you're driven, daring, and ready to build the next wave of AI innovation—then let's do this. Apply now and help us shape the future. About Expedite Commerce At Expedite Commerce, we believe that people achieve their best when technology enables them to build relationships and explore new ideas. So we build systems that free you up to focus on your customers and drive innovations. We have a great commerce platform that changes the way you do business! See more about us at expeditecommerce.com. You can also read about us on https://www.g2.com/products/expedite-commerce/reviews, and on Salesforce Appexchange/ExpediteCommerce. EEO Statement All qualified applicants to Expedite Commerce are considered for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran's status or any other protected characteristic. Show more Show less

Posted 1 month ago

Apply

10.0 years

0 Lacs

Mohali district, India

On-site

Linkedin logo

𝗔𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗥𝗼𝗹𝗲: We looking for a highly experienced and innovative Senior DevSecOps & Solution Architect to lead the design, implementation, and security of modern, scalable solutions across cloud platforms. The ideal candidate will bring a unique blend of DevSecOps practices, solution architecture, observability frameworks, and AI/ML expertise — with hands-on experience in data and workload migration from on-premises to cloud or cloud-to-cloud. You will play a pivotal role in transforming and securing our enterprise-grade infrastructure, automating deployments, designing intelligent systems, and implementing monitoring strategies for mission-critical applications. 𝗗𝗲𝘃𝗦𝗲𝗰𝗢𝗽𝘀 𝗟𝗲𝗮𝗱𝗲𝗿𝘀𝗵𝗶𝗽: • Own CI/CD strategy, automation pipelines, IaC (Terraform, Ansible), and container • orchestration (Docker, Kubernetes, Helm). • Champion DevSecOps best practices – embedding security into every stage of the SDLC. • Manage secrets, credentials, and secure service-to-service communication using Vault, • AWS Secrets Manager, or Azure Key Vault. • Conduct infrastructure hardening, automated compliance checks (CIS, SOC 2, ISO • 27001), and vulnerability management. • Solution Architecture: • Architect scalable, fault-tolerant, cloud-native solutions (AWS, Azure, or GCP). • Design end-to-end data flows, microservices, and serverless components. • Lead migration strategies for on-premises to cloud or cloud-to-cloud transitions, • ensuring minimal downtime and security continuity. • Create technical architecture documents, solution blueprints, BOMs, and migration • playbooks. • Observability & Monitoring: • Implement modern observability stacks: OpenTelemetry, ELK, Prometheus/Grafana, • DataDog, or New Relic. • Define golden signals (latency, errors, saturation, traffic) and enable APM, RUM, and log • aggregation. • Design SLOs/SLIs and establish proactive alerting for high-availability environments. 𝗔𝗜/𝗠𝗟 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 &𝗮𝗺𝗽; 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻: • Integrate AI/ML into existing systems for intelligent automation, data insights, and • anomaly detection. • Collaborate with data scientists to operationalize models using MLflow, SageMaker, • Azure ML, or custom pipelines. • Work with LLMs and foundational models (OpenAI, Hugging Face, Bedrock) for POCs or • production-ready features. • Migration & Transformation: • Lead complex data migration projects across heterogeneous environments — legacy • systems to cloud, or inter-cloud (e.g., AWS to Azure). • Ensure data integrity, encryption, schema mapping, and downtime minimization • throughout migration efforts. • Use tools such as AWS DMS, Azure Data Factory, GCP Transfer Services, or custom • scripts for lift-and-shift and re-architecture. 𝗥𝗲𝗾𝘂𝗶𝗿𝗲𝗱 𝗦𝗸𝗶𝗹𝗹𝘀 &𝗮𝗺𝗽; 𝗤𝘂𝗮𝗹𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀: • 10+ years in DevOps, cloud architecture, or platform engineering roles. • Expert in AWS and/or Azure – including IAM, VPC, EC2, Lambda/Functions, S3/Blob, API • Gateway, and container services (EKS/AKS). • Proficient in infrastructure as code: Terraform, CloudFormation, Ansible. • Hands-on with Kubernetes (k8s), Helm, GitOps workflows. • Strong programming/scripting skills in Python, Shell, or PowerShell. • Practical knowledge of AI/ML tools, libraries (TensorFlow, PyTorch, scikit-learn), and • model lifecycle management. • Demonstrated success in large-scale migrations and hybrid architecture. • Solid understanding of application security, identity federation, and compliance. Familiar with agile practices, project estimation, and stakeholder communication. 𝗡𝗶𝗰𝗲 𝘁𝗼 𝗛𝗮𝘃𝗲: • Certifications: AWS Solutions Architect, Azure Architect, Certified Kubernetes Admin, or similar. • Experience with Kafka, RabbitMQ, event-driven architecture. • Exposure to n8n, OpenFaaS, or AI agents. Show more Show less

Posted 1 month ago

Apply

6.0 years

0 Lacs

Gurgaon, Haryana, India

On-site

Linkedin logo

You Lead the Way. We’ve Got Your Back. With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, you’ll learn and grow as we help you create a career journey that’s unique and meaningful to you with benefits, programs, and flexibility that support you personally and professionally. At American Express, you’ll be recognized for your contributions, leadership, and impact—every colleague has the opportunity to share in the company’s success. Together, we’ll win as a team, striving to uphold our company values and powerful backing promise to provide the world’s best customer experience every day. And we’ll do it with the utmost integrity, and in an environment where everyone is seen, heard and feels like they belong. Join Team Amex and let's lead the way together. About Enterprise Architecture: Enterprise Architecture is an organization within the Chief Technology Office at American Express and it is a key enabler of the company’s technology strategy. The four pillars of Enterprise Architecture include: 1. Architecture as Code : this pillar owns and operates foundational technologies that are leveraged by engineering teams across the enterprise. 2. Architecture as Design : this pillar includes the solution and technical design for transformation programs and business critical projects which need architectural guidance and support. 3. Governance : this pillar is responsible for defining technical standards, and developing innovative tools that automate controls to ensure compliance. 4. Colleague Enablement: this pillar is focused on colleague development, recognition, training, and enterprise outreach. What you will be working on: We are looking for a Senior Engineer to join our Enterprise Architecture team. In this role you will be designing and implementing highly scalable real-time systems following the best practices and using the cutting-edge technology. This role is best suited for experienced engineers with broad skillset who are open, curious and willing to learn. Qualifications : What you will Bring: Bachelor's degree in computer science, computer engineering or a related field, or equivalent experience 6+ years of progressive experience demonstrating strong architecture, programming and engineering skills. Firm grasp of data structures, algorithms with fluency in programming languages like Java, Kotlin, Go Demonstrated ability to lead, partner, and collaborate cross functionally across many engineering organizations Experience in building real-time large scale, high volume, distributed data pipelines on top of data buses (Kafka). Hands on experience with large scale distributed NoSQL databases like Elasticsearch Knowledge and/or experience with containerized environments, Kubernetes, docker. Experience in implementing and maintained highly scalable micro services in Rest, GRPC Appetite for trying new things and building rapid POCs Preferred Qualifications: Knowledge of Observability concepts like Tracing, Metrics, Monitoring, Logging Knowledge of Prometheus Experience with large scale installations of Elasticsearch Knowledge of OpenTelemetry / OpenTracing Knowledge of observability tools like Jaeger, Kibana, Graphana etc. Open-source community involvement Knowledge of contact center, assisted servicing domain We back our colleagues and their loved ones with benefits and programs that support their holistic well-being. That means we prioritize their physical, financial, and mental health through each stage of life. Benefits include: Competitive base salaries Bonus incentives Support for financial-well-being and retirement Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location) Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need Generous paid parental leave policies (depending on your location) Free access to global on-site wellness centers staffed with nurses and doctors (depending on location) Free and confidential counseling support through our Healthy Minds program Career development and training opportunities American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law. Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations. Show more Show less

Posted 1 month ago

Apply

0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Linkedin logo

Reporting to: Sr Manager, Availability Management Office Location: Chennai, India Flexible Working: Hybrid (Part Office/Part Home) Cloud Site Reliability Engineer Responsibilities On-board internal customers to our 24x7 Applications Support and Enterprise Status Page services Be involved with creating an SRE culture globally by defining monitoring strategies and best practices at the organization. Monitor application performance and have the ability to provide recommendations on increasing the observability of applications and platforms. Play an important role in the Continual Service Improvement process, identifying and driving improvement Be instrumental to developing standards, guides to assist the business in maximizing their use of common tools . Participate in code peer reviews and enforce quality gates to ensure best practices are followed. Apply automation to tasks which would benefit from this. Automating repetitive tasks and deploying monitors via code are core examples. Document knowledge gained from engagements in the forms of runbooks and other information critical to incident response. Exploring and applying Artificial Intelligence to enhance operational processes/procedures Should-Haves - Skills & Experience Strong skills with modern monitoring tools and demonstrable knowledge of APM, RUM and/or synthetic testing. Experience working with observability tools such as Datadog, NewRelic, Splunk, CloudWatch, AzureMonitor Experience with the OpenTelemetry (OTEL) Standard Working knowledge of at least one programming language, such as Python, JavaScript (NodeJS, etc), Golang or others. Strong experience with IaC tools, such as Terraform and Cloudformation. Experience with cloud environments, especially AWS and/or Azure. Good customer interaction skills and able to understand their needs and expectations. Strength in conviction, able to encourage adoption to a wide audience but comfortable with mandating where necessary Experience with code quality tools, such as SonarQube. Knowledge on code linters tools of various programming languages. Experience with CI/CD tools. Such as Bamboo, Jenkins, Azure DevOps, Github actions. ITIL experience with basic understanding on incident management, problem management and change management. Nice-to-Haves - Skills & Experience Any cloud certification ITIL certifications Experience with ITSM tools Experience using On-Call Management Tooling No travel required Show more Show less

Posted 1 month ago

Apply

0 years

0 Lacs

Pune/Pimpri-Chinchwad Area

On-site

Linkedin logo

Job Description We are seeking a highly skilled Senior Reliability Engineer with strong backend software engineering skills to join our team. As a Senior Reliability Engineer , you will be responsible for designing, implementing, and maintaining our cloud infrastructure, ensuring the smooth operation of our applications and services. In addition, you will contribute to the development of our backend software systems, working closely with our engineering team to design, develop, and deploy scalable and reliable software solutions. This role will report to Senior Engineering Manager, Finance Engineering in Pune, Indi What you’ll do: Collaborate with your peers to envision, design, and develop solutions in your respective area with a bias toward reusability, toil reduction, and resiliency Surface opportunities across the broader organization for solving systemic issues Use a collaborative approach to make technical decisions that align with Procore’s architectural vision Partner with internal customers, peers, and leadership in planning, prioritization, and roadmap development Develop teammates by conducting code reviews, providing mentorship, pairing, and training opportunities Serve as a subject matter expert on tools, processes, and procedures and help guide others to create and maintain a healthy codebase Facilitate an “open source” mindset and culture both across teams internally and outside of Procore through active participation in and contributions to the greater community Design, develop, and deploy scalable and reliable backend software systems using languages such as Java, Python, or Go Work with engineering teams to design and implement microservices architecture Develop and maintain APIs using RESTful APIs, GraphQL, or gRPC Ensure high-quality code through code reviews, testing, and continuous integration Serve as a subject matter expert in a domain, including processes and software design that help guide others to create and maintain a healthy codebase What we’re looking for: Container orchestration (Kubernetes) K8s, preferably EKS. ArgoCD Terraform or similar IaC o11y (OpenTelemetry ideal) Public cloud (AWS, GCP, Azure) Cloud automation tooling (e.g., CloudFormation, Terraform, Ansible) Kafka and Kafka connectors Linux Systems Ensure compliance with security and regulatory requirements, such as HIPAA, SOX, FedRAMP Experience with the following is preferred: Continuous Integration Tooling (e.g., Circle CI, Jenkins, Travis, etc.) Continuous Deployment Tooling (e.g., ArgoCD, Spinnaker) Service Mesh / Discovery Tooling (e.g., Consul, Envoy, Istio, Linkerd) Networking (WAF, Cloudflare) Event-driven architecture (Event Sourcing, CQRS) Flink or other streaming processing technologies RDBMS and NoSQL databases Experience in working and developing APIs through REST, gRPC, or GraphQL Professional experience in Java, GoLang, Python preferred Additional Information Perks & Benefits At Procore, we invest in our employees and provide a full range of benefits and perks to help you grow and thrive. From generous paid time off and healthcare coverage to career enrichment and development programs, learn more details about what we offer and how we empower you to be your best. About Us Procore Technologies is building the software that builds the world. We provide cloud-based construction management software that helps clients more efficiently build skyscrapers, hospitals, retail centers, airports, housing complexes, and more. At Procore, we have worked hard to create and maintain a culture where you can own your work and are encouraged and given resources to try new ideas. Check us out on Glassdoor to see what others are saying about working at Procore. We are an equal-opportunity employer and welcome builders of all backgrounds. We thrive in a diverse, dynamic, and inclusive environment. We do not tolerate discrimination against candidates or employees on the basis of gender, sex, national origin, civil status, family status, sexual orientation, religion, age, disability, race, traveler community, status as a protected veteran or any other classification protected by law. If you'd like to stay in touch and be the first to hear about new roles at Procore, join our Talent Community. Alternative methods of applying for employment are available to individuals unable to submit an application through this site because of a disability. Contact our benefits team here to discuss reasonable accommodations. Show more Show less

Posted 1 month ago

Apply

5.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

Job Requisition ID # 25WD86258 Position Overview Autodesk is looking for Cloud Infrastructure Engineers to join the Platform Infrastructure team of the Autodesk Data Platform (ADP). This team is at the heart of Autodesk’s efforts to radically improve how we create value for customers and make decisions through data. As a Cloud Infrastructure Engineer, you will help create a robust and scalable Big Data Platform for teams across the company to leverage. You will tackle hard problems to improve the platform’s reliability, resiliency, and scalability. Ideally, you will be a self-starter, detail-oriented, quality-driven, and excited about the prospects of having a big impact with data at Autodesk. Our tech stack includes Spark, Presto, Hive, Kubernetes, Airflow, Jenkins, Python, Spinnaker, Terraform, Snowflake, Datadog, and various AWS services. Responsibilities Build and scale data infrastructure that powers batch and real-time data processing of billions of records daily Automate cloud infrastructure, services, and observability Help drive observability into the health of our data infrastructure and understanding of system behaviour Develop scripts to manage Cloud Infrastructure using Python or other frameworks for Cloud-native development Drive initiatives to enable best practices across infrastructure, deployments, automation, and accessibility Develop and implement security best practices at the data, application, infrastructure, and network layers Interface with data engineers, data scientists, product managers, and all data stakeholders to understand their needs and promote best practices Minimum Qualifications 5-8 years of relevant industry experience in a large-scale infrastructure environment 3+ years of Automation/DevOps Developer experience Strong experience in AWS Cloud Automation (EMR, EKS, EC2, ECS, S3, IAM Policies, etc.) Strong overall programming skills, able to write modular, maintainable code, preferably in Python Preferred Qualifications Participate in all phases of the product lifecycle, including design, development, and deployment Automation of testing framework management and migration End-to-end monitoring and dashboard tools (Grafana, OpenTelemetry, Datadog) Experience in Big Data infrastructure such as Spark, Hive, Presto, etc Learn More About Autodesk Welcome to Autodesk! Amazing things are created every day with our software – from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made. We take great pride in our culture here at Autodesk – our Culture Code is at the core of everything we do. Our values and ways of working help our people thrive and realize their potential, which leads to even better outcomes for our customers. When you’re an Autodesker, you can be your whole, authentic self and do meaningful work that helps build a better future for all. Ready to shape the world and your future? Join us! Salary transparency Salary is one part of Autodesk’s competitive compensation package. Offers are based on the candidate’s experience and geographic location. In addition to base salaries, we also have a significant emphasis on discretionary annual cash bonuses, commissions for sales roles, stock or long-term incentive cash grants, and a comprehensive benefits package. Diversity & Belonging We take pride in cultivating a culture of belonging and an equitable workplace where everyone can thrive. Learn more here: https://www.autodesk.com/company/diversity-and-belonging Are you an existing contractor or consultant with Autodesk? Please search for open jobs and apply internally (not on this external site). Show more Show less

Posted 1 month ago

Apply

4.0 - 6.0 years

3 - 5 Lacs

Mumbai, Kurla

Work from Office

Naukri logo

Required: Expertise in AWS, including basic services like networking, data and workload management. o AWS Networking: VPC, VPC Peering, Transit Gateway, RouteTables, SecurityGroups, etc. Data: RDS, DynamoDB, ElasticSearch Workload: EC2, EKS, Lambda, etc. Required Skills: Experience in any one of the CI/CD tools (Gitlab/Github/Jenkins) including runner setup, templating and configuration. Kubernetes experience or Ansible Experience (EKS/AKS/GKE), basics like a pod, deployment, networking, and service mesh. Used any package manager like Helm. Scripting experience (python), automation in pipelines when required, system service. Infrastructure automation (Terraform/pulumi/cloud formation), write modules, setup pipeline and version the code. Optional: Experience in any programming language is not required but is appreciated. Good experience in GIT, SVN or any other code management tool is required. DevSecops tools like (Qualys/SonarQube/BlackDuck) for security scanning of artefacts, infrastructure and code. Observability tools (Opensource: Prometheus, Elasticsearch, OpenTelemetry; Paid: Datadog, 24/7, etc)

Posted 1 month ago

Apply

18.0 years

0 Lacs

Noida, Uttar Pradesh, India

On-site

Linkedin logo

Our Company Changing the world through digital experiences is what Adobe’s all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We’re passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We’re on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours! Opportunity Adobe is looking for a strategic and results-driven Director of Site Reliability Engineering (SRE) . This role provides a unique opportunity to drive innovation , work alongside senior leaders, and influence business-critical initiatives at scale. The ideal candidate is an experienced engineering leader who will guide a high-performing, globally distributed SRE team. You will be responsible for defining the technical strategy, reliability vision, and operational excellence roadmap, ensuring the availability and performance of Adobe’s multi-tenant, web-scale digital products. Role Summary As Director of Site Reliability Engineering, you will lead multiple SRE teams across Noida and Bangalore, managing multi-tiered leaders reporting to you. You will play a pivotal role in: Driving system reliability, scalability, and performance for Adobe’s solutions. Owning the technical direction, automation, monitoring, and infrastructure provisioning. Collaborating with engineering, product, and operations teams to drive innovation and reliability at scale. What You’ll Do Leadership & Strategy: Develop and execute the SRE roadmap to ensure high availability (99.99%+ uptime), scalability, and reliability of Adobe’s products Operational Excellence: Define and implement best practices for observability, monitoring, and incident response, leveraging advanced AI/ML-powered analytics. Automation & Infrastructure: Drive automation initiatives for CI/CD, infrastructure provisioning, and self-healing capabilities to reduce toil and increase efficiency. Incident Response & Performance Optimization: Establish proactive incident management processes, conduct blameless postmortems, and continuously improve system resilience. Cloud & Big Data Technologies: Optimize Adobe’s cloud-native architectures (AWS, Azure, GCP) and integrate big data technologies such as Hadoop, Spark, Kafka, and Cassandra. Cross-functional Collaboration: Work closely with product management, marketing, customer success, and global consulting teams to align business goals with engineering efforts. Customer Engagement: Partner with enterprise clients on pre-sales and post-sales engagements, providing technical guidance and reliability best practices. Team Development & Mentorship: Build and mentor a world-class SRE team, fostering a culture of innovation, ownership, and operational excellence. What You Need To Succeed 18+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering, with at least 8 years in leadership roles. Proven track record of leading large-scale, high-impact engineering projects in a global enterprise. Experience managing multiple teams (4+ years as a second-level manager). Prior experience working with US-based leadership; previous work experience in the US is a plus. Strong expertise in distributed systems, microservices, cloud platforms (AWS/Azure/GCP), and container orchestration (Kubernetes, Docker, ECS). Hands-on experience with monitoring & observability tools (Datadog, Prometheus, ELK, OpenTelemetry). Deep understanding of SLOs, SLIs, SLAs, and error budgets to drive service reliability. Excellent stakeholder management skills, with the ability to collaborate across engineering, business, and customer-facing teams. A strategic thinker with intellectual curiosity about products, market trends, and business growth. Strong communication, analytical, and problem-solving skills with the ability to influence C-suite executives. B.Tech / M.Tech in Computer Science from a premier institute. Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other applicable characteristics protected by law. Learn more. Adobe aims to make Adobe.com accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email accommodations@adobe.com or call (408) 536-3015. Show more Show less

Posted 1 month ago

Apply

6.0 years

0 Lacs

Trivandrum, Kerala, India

On-site

Linkedin logo

Role Description Role Proficiency: Act creatively to develop applications and select appropriate technical options optimizing application development maintenance and performance by employing design patterns and reusing proven solutions account for others' developmental activities Outcomes Interpret the application/feature/component design to develop the same in accordance with specifications. Code debug test document and communicate product/component/feature development stages. Validate results with user representatives; integrates and commissions the overall solution Select appropriate technical options for development such as reusing improving or reconfiguration of existing components or creating own solutions Optimises efficiency cost and quality. Influence and improve customer satisfaction Set FAST goals for self/team; provide feedback to FAST goals of team members Measures Of Outcomes Adherence to engineering process and standards (coding standards) Adherence to project schedule / timelines Number of technical issues uncovered during the execution of the project Number of defects in the code Number of defects post delivery Number of non compliance issues On time completion of mandatory compliance trainings Code Outputs Expected: Code as per design Follow coding standards templates and checklists Review code – for team and peers Documentation Create/review templates checklists guidelines standards for design/process/development Create/review deliverable documents. Design documentation r and requirements test cases/results Configure Define and govern configuration management plan Ensure compliance from the team Test Review and create unit test cases scenarios and execution Review test plan created by testing team Provide clarifications to the testing team Domain Relevance Advise Software Developers on design and development of features and components with a deep understanding of the business problem being addressed for the client. Learn more about the customer domain identifying opportunities to provide valuable addition to customers Complete relevant domain certifications Manage Project Manage delivery of modules and/or manage user stories Manage Defects Perform defect RCA and mitigation Identify defect trends and take proactive measures to improve quality Estimate Create and provide input for effort estimation for projects Manage Knowledge Consume and contribute to project related documents share point libraries and client universities Review the reusable documents created by the team Release Execute and monitor release process Design Contribute to creation of design (HLD LLD SAD)/architecture for Applications/Features/Business Components/Data Models Interface With Customer Clarify requirements and provide guidance to development team Present design options to customers Conduct product demos Manage Team Set FAST goals and provide feedback Understand aspirations of team members and provide guidance opportunities etc Ensure team is engaged in project Certifications Take relevant domain/technology certification Skill Examples Explain and communicate the design / development to the customer Perform and evaluate test results against product specifications Break down complex problems into logical components Develop user interfaces business software components Use data models Estimate time and effort required for developing / debugging features / components Perform and evaluate test in the customer or target environment Make quick decisions on technical/project related challenges Manage a Team mentor and handle people related issues in team Maintain high motivation levels and positive dynamics in the team. Interface with other teams designers and other parallel practices Set goals for self and team. Provide feedback to team members Create and articulate impactful technical presentations Follow high level of business etiquette in emails and other business communication Drive conference calls with customers addressing customer questions Proactively ask for and offer help Ability to work under pressure determine dependencies risks facilitate planning; handling multiple tasks. Build confidence with customers by meeting the deliverables on time with quality. Estimate time and effort resources required for developing / debugging features / components Make on appropriate utilization of Software / Hardware’s. Strong analytical and problem-solving abilities Knowledge Examples Appropriate software programs / modules Functional and technical designing Programming languages – proficient in multiple skill clusters DBMS Operating Systems and software platforms Software Development Life Cycle Agile – Scrum or Kanban Methods Integrated development environment (IDE) Rapid application development (RAD) Modelling technology and languages Interface definition languages (IDL) Knowledge of customer domain and deep understanding of sub domain where problem is solved Additional Comments Senior Java backend Microservices Software Engineer Musts: Strong understanding of object-oriented and functional programming principles Experience with RESTful APIs Knowledge of microservices architecture and cloud platforms Familiarity with CICD pipelines, Docker, and Kubernetes Strong problem-solving skills and ability to work in an Agile environment Excellent communication and teamwork skills Nices: 6+ years of experience, with at least 3+ in Kotlin Experience with backend development using Kotlin (Ktor, Spring Boot, or Micronaut) Proficiency in working with databases such as PostgreSQL, MySQL, or MongoDB Experience with GraphQL and WebSockets Additional Musts: Experience with backend development in the Java ecosystem (either Java or Kotlin will do) Additional Nices: Experience with Typescript and NodeJS Experience with Kafka Experience with frontend development (e.g. React) Experience with Gradle Experience with GitLab CI Experience with OpenTelemetry Skills Restful Apis,Java,Microservices,Aws Show more Show less

Posted 1 month ago

Apply

5.0 years

0 Lacs

India

On-site

Linkedin logo

Experience: 5+ years in high-volume ESP integrations & deliverability optimization Tech Stack: Laravel 10/11 • Node.js 18+ (Bun) • SendGrid (Mail & Marketing APIs) • Redis • MySQL 8 • Docker • GitHub Actions • OSS queues (Bee-Queue, BullMQ-OSS, Taskless, etc.) About the Role We are aiming to develop a Bulk Email Marketing module that must land emails in the inbox—not the spam folder. You will design and operate queue‑driven batch sends and—most critically—engineer deliverability safeguards to keep spam rates below 0.1 % across millions of sends. You’ll also migrate our current Node/BullMQ service to an open‑source queue and integrate everything seamlessly into our Laravel-based CRM. All front‑end work is handled by a separate team; your focus is pure back‑end infrastructure. Key Responsibilities Architect & build the bulk‑send workflow: throttling, retries, parallel batch pipelines and dedicated IP management. Implement robust deliverability controls: Automated SPF, DKIM, DMARC, BIMI & ARC checks on every sender domain. List‑hygiene pruning, bounce/complaint feedback loops, and reputation scoring. Pre‑send spam‑filter diagnostics (SpamAssassin rules, seed‑list placement tests). Migrate our existing Node micro‑service from BullMQ’s paid batch feature to an OSS queue without regressions. Expose clean REST APIs for the front‑end team to consume (campaign creation, scheduling, analytics). Handle bounce reports, unsubscribe management, and analytics integration. Ensure proper authentication, template rendering, scheduling, and delivery tracking. Ensure module security, scalability, and performance. Write tests/docs, perform code reviews and mentor teammates on email infrastructure best practices. Required Skills & Experience Expert‑level SendGrid integration (Marketing & Transactional) with proven record raising inbox placement. Proven knowledge of other mail ESP platforms like (SES, Postmark etc). Deep knowledge of deliverability levers: SPF, DKIM, DMARC, BIMI, IP warm‑up, feedback loops, spam‑trap avoidance, content quality scoring. Production experience with Node.js/Bun workers and Redis‑backed queues at 100 k+ emails/hour (Bee‑Queue, BullMQ‑OSS, Taskless, or Redis streams). Strong Laravel background (queues/Horizon, events, policies) to integrate micro‑services with the core CRM. Proficient with Docker‑based deployments and CI/CD pipelines using GitHub Actions. Ability to write clear documentation and conduct rigorous code reviews. Nice to Have Implemented seed‑list/inbox‑placement monitoring tools (GlockApps, Mail‑Tester, Google Postmaster). Experience migrating from paid BullMQ features to Bee‑Queue, Taskless, or custom Redis streams. Familiarity with other ESPs (AWS SES, Postmark) for future multi‑ESP abstraction. Observability with OpenTelemetry traces across micro‑services. Knowledge of Prometheus/Grafana dashboards. Show more Show less

Posted 1 month ago

Apply

2.0 years

0 Lacs

India

Remote

Linkedin logo

At Rethem, we're revolutionizing the sales landscape by putting buyer outcomes at the forefront. We understand that customers buy outcomes, and our AI-driven platform empowers your sales reps to deliver those outcomes, helping them crush their quotas. What Sets Us Apart Deep AI Integration: Our platform leverages advanced AI that acts as a personal coach for your reps, adapting to your business processes to automate complex tasks and provide real-time guidance Outcome-Driven Approach: By focusing on delivering measurable outcomes, we enable your sales team to build trust and foster long-term customer relationships Market Leadership: Positioned at the cutting edge of buyer-centric sales transformation, we're leading the shift towards more meaningful and effective sales interactions Proven Expertise: Our leadership and team consist of industry veterans with a track record of driving substantial growth and innovation in sales Our Mission To redefine the sales process by aligning it with buyer needs, leveraging AI to empower sales teams to deliver outcomes that drive mutual success. Transform Your Sales Strategy with AI Rethem turns your sales playbook into an intelligent, always-on guide that adapts in real-time. By harnessing the power of AI, we provide your team with: Real-Time Coaching: Enhance performance with actionable insights during every buyer interaction Enhanced Efficiency: Automate key processes so your reps can focus on building relationships and delivering value Outcome Alignment: Ensure your offerings are perfectly aligned with customer objectives, leading to higher satisfaction and loyalty Accelerate Growth: Drive higher win rates and larger deals through a buyer-focused approach Vision for the Future We envision a future where AI and human expertise collaborate seamlessly to create unparalleled sales experiences. By continuously innovating, we aim to stay at the forefront of buyer-centric sales transformation. Join the Sales Revolution Emerging from stealth mode, Rethem invites a select group of visionary organizations to pilot our groundbreaking platform. If you're ready to elevate your sales team, deliver exceptional customer outcomes, and empower your reps to crush their quotas, visit our website to learn more and apply. Be Part of Our Journey We're assembling a team of innovators passionate about reshaping the sales industry. Explore career opportunities with Re:them and help shape the future of outcome-driven, AI-powered sales. Experience the Power of AI-Driven Sales Transformation with Re:them. The Role We are seeking a hands-on Agentic AI Ops Engineer who thrives at the intersection of cloud infrastructure , AI agent systems , and DevOps automation . In this role, you will build and maintain the CI/CD infrastructure for Agentic AI solutions using Terraform on AWS , while also developing, deploying, and debugging intelligent agents and their associated tools . This position is critical to ensuring scalable, traceable, and cost-effective delivery of agentic systems in production environments. The Responsibilities CI/CD Infrastructure for Agentic AI Design, implement, and maintain CI/CD pipelines for Agentic AI applications using Terraform , AWS CodePipeline , CodeBuild , and related tools. Automate deployment of multi-agent systems and associated tooling, ensuring version control, rollback strategies, and consistent environment parity across dev/test/prod Agent Development & Debugging Collaborate with ML/NLP engineers to develop and deploy modular, tool-integrated AI agents in production. Lead the effort to create debuggable agent architectures , with structured logging, standardized agent behaviors, and feedback integration loops. Build agent lifecycle management tools that support quick iteration, rollback, and debugging of faulty behaviors Monitoring, Tracing & Reliability Implement end-to-end observability for agents and tools, including runtime performance metrics , tool invocation traces , and latency/accuracy tracking . Design dashboards and alerting mechanisms to capture agent failures, degraded performance, and tool bottlenecks in real-time. Build lightweight tracing systems that help visualize agent workflows and simplify root cause analysis Cost Optimization & Usage Analysis Monitor and manage cost metrics associated with agentic operations including API call usage , toolchain overhead , and model inference costs . Set up proactive alerts for usage anomalies , implement cost dashboards , and propose strategies for reducing operational expenses without compromising performance Collaboration & Continuous Improvement Work closely with product, backend, and AI teams to evolve the agentic infrastructure design and tool orchestration workflows . Drive the adoption of best practices for Agentic AI DevOps , including retraining automation, secure deployments, and compliance in cloud-hosted environments. Participate in design reviews, postmortems, and architectural roadmap planning to continuously improve reliability and scalability Requirements 2+ years of experience in DevOps, MLOps, or Cloud Infrastructure with exposure to AI/ML systems . Deep expertise in AWS serverless architecture , including hands-on experience with: AWS Lambda - function design, performance tuning, cold-start optimization. Amazon API Gateway - managing REST/HTTP APIs and integrating with Lambda securely. Step Functions - orchestrating agentic workflows and managing execution states. S3, DynamoDB, EventBridge, SQS - event-driven and storage patterns for scalable AI systems. Strong proficiency in Terraform to build and manage serverless AWS environments using reusable, modular templates Experience deploying and managing CI/CD pipelines for serverless and agent-based applications using AWS CodePipeline, CodeBuild, CodeDeploy , or GitHub Actions Hands-on experience with agent and tool development in Python , including debugging and performance tuning in production. Solid understanding of IAM roles and policies , VPC configuration, and least-privilege access control for securing AI systems. Deep understanding of monitoring, alerting, and distributed tracing systems (e.g., CloudWatch, Grafana, OpenTelemetry). Ability to manage environment parity across dev, staging, and production using automated infrastructure pipelines. Excellent debugging, documentation, and cross-team communication skills Benefits Health Insurance, PTO, and Leave time Ongoing paid professional training and certifications Fully Remote work Opportunity Strong Onboarding & Training programs Are you r eady to Join the Revolution? If you're ready to take on this exciting challenge and believe you meet our requirements, we encourage you to apply. Let's shape the future of AI-driven sales together! See more about us at https://www.rethem.ai/ EEO Statement All qualified applicants to Expedite Commerce are considered for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran's status or any other protected characteristic. Show more Show less

Posted 1 month ago

Apply

4.0 years

0 Lacs

Bengaluru East, Karnataka, India

On-site

Linkedin logo

Overview As a Software Engineer in the Artificial Intelligence group, you will contribute to developing and optimizing the backend infrastructure that supports AI-driven solutions. You will work closely with machine learning engineers and cross-functional teams to build scalable backend services, automate deployments, and improve system performance. Your role will focus on Python-based backend development, Kubernetes operations, and DevOps best practices to ensure reliable and efficient AI model deployments. Responsibilities Develop and maintain backend services and APIs that support AI models and intelligent assistants. Improve scalability and performance of AI model serving and API interactions.Ensure system reliability by implementing logging, monitoring, and alerting solutions. Assist in deploying AI models using Kubernetes and Docker, ensuring smooth model integration into production. Contribute to CI/CD pipelines for AI applications, automating model testing and deployments. Work on data pipelines and optimize storage and retrieval for AI workloads. Work on infrastructure automation using Terraform, CloudFormation, or other Infrastructure as Code (IaC) tools. Support cloud-based deployments on AWS, GCP, or Azure, optimizing resource usage. Work closely with AI/ML engineers to understand infrastructure requirements for AI solutions. Participate in code reviews, architecture discussions, and knowledge-sharing sessions. Continuously learn and improve skills in backend development, cloud technologies, and DevOps. Requirements 4 years of experience in backend development using Python (preferred) or Java. Experience with RESTful API development, micro-services, and cloud-based architectures. Familiarity with Kubernetes, Docker, and containerised deployments. Hands-on experience with CI/CD tools (e.g., Jenkins, GitHub Actions, ArgoCD). Basic understanding of cloud platforms (AWS, GCP, or Azure) and their services. Strong problem-solving skills and a willingness to learn new technologies. Preferred Experience Exposure to AI/ML pipelines, model serving, or data engineering workflows. Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry). Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Show more Show less

Posted 1 month ago

Apply

5.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

About Us IAMOPS is DevOps-focused services company helping startups and enterprises build scalable, reliable, and secure infrastructure. Our team thrives on solving complex infrastructure challenges, implementing automation, and working directly with clients to deliver value through modern DevOps practices. Job Summary We are seeking a highly capable and experienced Senior DevOps Engineer with 4–5 years of hands-on experience. The ideal candidate must possess deep knowledge of Linux systems, networking, scripting (Bash, Python), and automation tools, with the ability to take ownership of projects, collaborate directly with clients, and lead internal team efforts when required. You’ll play a key role in delivering DevOps solutions across various client environments while mentoring junior team members and driving technical excellence. Key Responsibilities Client-Facing DevOps Delivery: Work directly with client stakeholders to gather requirements, understand their infrastructure pain points, and deliver robust DevOps solutions. Linux & Networking Mastery: Architect and troubleshoot systems with a strong foundation in Linux internals, process management, network stack, routing, firewalls, etc. Automation & Scripting: Automate repetitive tasks using Bash and Python scripts. Maintain and extend reusable automation assets. Infrastructure as Code (IaC): Develop and manage infrastructure using tools like Terraform, Ansible, or similar. CI/CD Ownership: Build, maintain, and optimize CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI, etc. Containerization & Orchestration: Deploy and manage applications using Docker and Kubernetes in production environments. Cloud Management: Architect and manage infrastructure across AWS, Azure, or GCP. Implement cost-effective and scalable cloud strategies. Monitoring & Logging: Implement observability stacks like Prometheus, Grafana, ELK, or cloud-native solutions. Mentorship: Guide and support junior engineers; contribute to knowledge-sharing, code reviews, and internal standards. Key Requirements 4–5 years of hands-on DevOps experience in production environments. Strong fundamentals in: Linux administration and troubleshooting. Computer Networking – firewalls, routing, DNS, load balancing, NAT, etc. Scripting – Bash (required), Python (preferred). Experience with: CI/CD tools (Jenkins, GitLab CI, GitHub Actions). Docker and Kubernetes. Cloud platforms – AWS (preferred), GCP or Azure. Infrastructure-as-Code – Terraform, Ansible, or similar. Ability to work independently with clients, understand business needs, and translate them into technical solutions. Proven experience in collaborating with or leading small teams in a fast-paced environment. Nice to Have Cloud or Kubernetes certifications (AWS Certified DevOps Engineer, CKA, etc.) Familiarity with GitOps, Helm, and service mesh architectures. Exposure to monitoring tools like Datadog, New Relic, or OpenTelemetry. Soft Skills Strong communication skills (written and verbal) to interact effectively with clients and team members. Mature problem-solver who can anticipate issues and resolve them proactively. Organized and self-motivated with a willingness to take ownership of projects. Leadership potential with a collaborative team mindset. Why Join Us? Work with cutting-edge DevOps stacks and innovative startups globally. Be part of a collaborative, learning-focused culture. Opportunity to grow into technical leadership roles. Flexible working environment with a focus on outcomes. Skills: github actions,gitlab ci,jenkins,terraform,elk,aws,python,basic networking,devops,linux,grafana,automation,bash,ansible,azure,gcp,kubernetes,docker,prometheus,networking,infrastructure Show more Show less

Posted 1 month ago

Apply

5.0 years

0 Lacs

Surat, Gujarat, India

On-site

Linkedin logo

About Us IAMOPS is DevOps-focused services company helping startups and enterprises build scalable, reliable, and secure infrastructure. Our team thrives on solving complex infrastructure challenges, implementing automation, and working directly with clients to deliver value through modern DevOps practices. Job Summary We are seeking a highly capable and experienced Senior DevOps Engineer with 4–5 years of hands-on experience. The ideal candidate must possess deep knowledge of Linux systems, networking, scripting (Bash, Python), and automation tools, with the ability to take ownership of projects, collaborate directly with clients, and lead internal team efforts when required. You’ll play a key role in delivering DevOps solutions across various client environments while mentoring junior team members and driving technical excellence. Key Responsibilities Client-Facing DevOps Delivery: Work directly with client stakeholders to gather requirements, understand their infrastructure pain points, and deliver robust DevOps solutions. Linux & Networking Mastery: Architect and troubleshoot systems with a strong foundation in Linux internals, process management, network stack, routing, firewalls, etc. Automation & Scripting: Automate repetitive tasks using Bash and Python scripts. Maintain and extend reusable automation assets. Infrastructure as Code (IaC): Develop and manage infrastructure using tools like Terraform, Ansible, or similar. CI/CD Ownership: Build, maintain, and optimize CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI, etc. Containerization & Orchestration: Deploy and manage applications using Docker and Kubernetes in production environments. Cloud Management: Architect and manage infrastructure across AWS, Azure, or GCP. Implement cost-effective and scalable cloud strategies. Monitoring & Logging: Implement observability stacks like Prometheus, Grafana, ELK, or cloud-native solutions. Mentorship: Guide and support junior engineers; contribute to knowledge-sharing, code reviews, and internal standards. Key Requirements 4–5 years of hands-on DevOps experience in production environments. Strong fundamentals in: Linux administration and troubleshooting. Computer Networking – firewalls, routing, DNS, load balancing, NAT, etc. Scripting – Bash (required), Python (preferred). Experience with: CI/CD tools (Jenkins, GitLab CI, GitHub Actions). Docker and Kubernetes. Cloud platforms – AWS (preferred), GCP or Azure. Infrastructure-as-Code – Terraform, Ansible, or similar. Ability to work independently with clients, understand business needs, and translate them into technical solutions. Proven experience in collaborating with or leading small teams in a fast-paced environment. Nice to Have Cloud or Kubernetes certifications (AWS Certified DevOps Engineer, CKA, etc.) Familiarity with GitOps, Helm, and service mesh architectures. Exposure to monitoring tools like Datadog, New Relic, or OpenTelemetry. Soft Skills Strong communication skills (written and verbal) to interact effectively with clients and team members. Mature problem-solver who can anticipate issues and resolve them proactively. Organized and self-motivated with a willingness to take ownership of projects. Leadership potential with a collaborative team mindset. Why Join Us? Work with cutting-edge DevOps stacks and innovative startups globally. Be part of a collaborative, learning-focused culture. Opportunity to grow into technical leadership roles. Flexible working environment with a focus on outcomes. Skills: github actions,gitlab ci,jenkins,terraform,elk,aws,python,basic networking,devops,linux,grafana,automation,bash,ansible,azure,gcp,kubernetes,docker,prometheus,networking,infrastructure Show more Show less

Posted 1 month ago

Apply

0.0 - 3.0 years

2 - 5 Lacs

Bengaluru

Work from Office

Naukri logo

Key Responsibilities: Deliver engaging and interactive training sessions (24 hours total) based on structured modules. Teach integration of monitoring, logging, and observability tools with machine learning. Guide learners in real-time anomaly detection, incident management, root cause analysis, and predictive scaling. Support learners in deploying tools like Prometheus, Grafana, OpenTelemetry, Neo4j, Falco, and KEDA. Conduct hands-on labs using LangChain, Ollama, Prophet, and other AI/ML frameworks. Help participants set up smart workflows for alert classification and routing using open-source stacks. Prepare learners to handle security, threat detection, and runtime anomaly classification using LLMs. Provide post-training support and mentorship when necessary.

Posted 1 month ago

Apply

8.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing. "DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC “The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. Job Description As a Staff Software Engineer - AI In-Market Engineering , you’ll be the final escalation point for the most complex and critical issues affecting enterprise and hyperscale environments. This hands-on role is ideal for a deep technical expert who thrives under pressure and has a passion for solving distributed system challenges at scale. You’ll collaborate with Engineering, Product Management, and Field teams to drive root cause resolutions, define architectural best practices, and continuously improve product resiliency. Leveraging AI tools and automation, you’ll reduce time-to-resolution, streamline diagnostics, and elevate the support experience for strategic customers. Key Responsibilities Technical Expertise & Escalation Leadership Own critical customer case escalations end-to-end, including deep root cause analysis and mitigation strategies. Act as the highest technical escalation point for Infinia support incidents — especially in production-impacting scenarios. Lead war rooms, live incident bridges, and cross-functional response efforts with Engineering, QA, and Field teams. Utilize AI-powered debugging, log analysis, and system pattern recognition tools to accelerate resolution. Product Knowledge & Value Creation Become a subject-matter expert on Infinia internals: metadata handling, storage fabric interfaces, performance tuning, AI integration, etc. Reproduce complex customer issues and propose product improvements or workarounds. Author and maintain detailed runbooks, performance tuning guides, and RCA documentation. Feed real-world support insights back into the development cycle to improve reliability and diagnostics. Customer Engagement & Business Enablement Partner with Field CTOs, Solutions Architects, and Sales Engineers to ensure customer success. Translate technical issues into executive-ready summaries and business impact statements. Participate in post-mortems and executive briefings for strategic accounts. Drive adoption of observability, automation, and self-healing support mechanisms using AI/ML tools. Required Qualifications 8+ years in enterprise storage, distributed systems, or cloud infrastructure support/engineering. Deep understanding of file systems (POSIX, NFS, S3), storage performance, and Linux kernel internals. Proven debugging skills at system/protocol/app levels (e.g., strace, tcpdump, perf). Hands-on experience with AI/ML data pipelines, container orchestration (Kubernetes), and GPU-based architectures. Exposure to RDMA, NVMe-oF, or high-performance networking stacks. Exceptional communication and executive reporting skills. Experience using AI tools (e.g., log pattern analysis, LLM-based summarization, automated RCA tooling) to accelerate diagnostics and reduce MTTR. Preferred Qualifications Experience with DDN, VAST, Weka, or similar scale-out file systems. Strong scripting/coding ability in Python, Bash, or Go. Familiarity with observability platforms: Prometheus, Grafana, ELK, OpenTelemetry. Knowledge of replication, consistency models, and data integrity mechanisms. Exposure to Sovereign AI, LLM model training environments, or autonomous system data architectures. This position requires participation in an on-call rotation to provide after-hours support as needed. Success Metrics – First 30 Days Technical Ramp-Up Complete Infinia training, labs, and architecture deep dives. Stand up a fully functioning Infinia test system. Shadow at least 5 complex escalations and participate in 2 customer calls. Operational Integration Lead one live incident response and deliver a full RCA within 48 hours. Propose 3+ enhancements to internal tools, AI/automation usage, or documentation. Establish key partnerships with Engineering and Field teams. Strategic Insight Deliver a written 30-day reflection with gaps and high-impact recommendations. Begin identifying patterns where AI or automation can reduce MTTR or improve proactive detection. Success Metrics – Beyond 30 Days MTTR on high-severity cases consistently below internal SLAs. Volume and quality of resolved L4 escalations. Strategic tooling or automation contributions adopted across the support org. Executive-ready RCAs that inform product improvement. High-impact engagements with strategic accounts (prevention, performance tuning, etc.) DDN DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law. Show more Show less

Posted 1 month ago

Apply

5.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. The Role As an IT Solutions professional, you will act as the primary technical leader, guiding systems management specialists and internal teams through complex issues. You will be a reliable expert for customers and Kyndryl account teams, providing insight, technical guidance, and support during significant incidents and vital technical discussions. With your expertise, you will evaluate customers’ IT environments, identify technological gaps, and develop tailored remediation plans to enhance their operational capabilities. Your recommendations will help businesses progress and remain competitive in the digital landscape. Your responsibilities include architecting, deploying, and optimizing the Elastic Observability stack to support comprehensive telemetry collection. You will implement APM, Logs, Metrics, and Uptime Monitoring using Elastic and OpenTelemetry standards. You will design Elastic index templates, ILM policies, ingest pipelines, and dashboards tailored to enterprise requirements. Additionally, you will collaborate with infrastructure, application, and DevOps teams to incorporate applications and services into observability pipelines. You will integrate Elastic with third-party tools such as Zabbix, Prometheus, and OpenTelemetry Collector, tune performance and storage strategies for high-scale ingestion environments (50+ apps, 500+ servers), create SOPs, runbooks, and dashboards for observability operations, and provide guidance on cost optimization, licensing, and scaling models for Elastic deployments. Your Future at Kyndryl Every position at Kyndryl offers a way forward to grow your career. You’ll have access to data, hands-on learning experiences, and the chance to certify in all four major platforms. Whether you want to broaden your knowledge base or narrow your scope and specialize in a specific sector, you can find opportunities here that you won’t find anywhere else. Who You Are You’re good at what you do and possess the required experience to prove it. However, equally as important – you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused – someone who prioritizes customer success in their work. And finally, you’re open and borderless – naturally inclusive in how you work with others. Required Skills And Experience 5+ years with the Elastic Stack (Elasticsearch, Kibana, Logstash/Beats) Strong knowledge of Elastic APM, Fleet, and OpenTelemetry integrations Experience with data ingestion and transformation using Logstash, Filebeat, Metricbeat, or custom agents Proficient in designing dashboards, visualizations, and alerts in Kibana Knowledge of Kubernetes, Docker, and Linux systems Understanding of ILM, hot-warm-cold tiering, and Elastic security controls Preferred Skills And Experience Exposure to Elastic Cloud, ECE, or ECK Familiarity with Dynatrace, Datadog, AppDynamics, or SigNoz for benchmarking OpenTelemetry Collector & Elastic Integration Experienced at managing the expectations of business leaders in times of crisis Being You Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we’re not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you – and everyone next to you – the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That’s the Kyndryl Way. What You Can Expect With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter – wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed. Get Referred! If you know someone that works at Kyndryl, when asked ‘How Did You Hear About Us’ during the application process, select ‘Employee Referral’ and enter your contact's Kyndryl email address. Show more Show less

Posted 1 month ago

Apply

10.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Linkedin logo

About The Team/Role We are seeking a Senior Engineering Manager with a highly technical, hands-on engineering mindset to lead a team within WEX’s International Mobility Engineering organization. International Mobility is a global team operating across the UK, India, Brazil, and the US. This role requires a leader who can blend technical excellence with strong engineering leadership, fostering a DevOps mindset, Agile execution, and best-in-class software development practices. As a Senior Engineering Manager, you will be responsible for building scalable, reliable, and secure systems while driving a culture of continuous improvement, automation, and operational excellence. You will partner with engineering teams, product leaders, and business stakeholders to ensure that WEX’s International Mobility products deliver high-performance, cloud-native solutions that meet global business needs. How you’ll make an impact Lead by example with hands-on involvement in architecting, coding, reviewing, testing, and deploying mission-critical software. Instill a DevOps culture by championing automation, CI/CD pipelines, and infrastructure-as-code to enhance system reliability and efficiency. Promote Agile and Lean engineering best practices, ensuring incremental delivery, iterative improvements, and continuous integration. Own the full development lifecycle, from system design and implementation to deployment and monitoring. Establish and enforce engineering excellence in code quality, test automation, and observability. Drive a data-driven approach to engineering productivity, measuring performance using key engineering and operational metrics. Lead a high-performing engineering team, setting clear goals, fostering a culture of accountability, and mentoring engineers. Optimize team efficiency by eliminating bottlenecks, improving developer experience, and streamlining workflows. Build scalable, fault-tolerant, and secure distributed systems that meet business SLAs and regulatory requirements. Collaborate with Product and Business teams to align technical execution with strategic priorities. Own and drive technical decisions, balancing trade-offs between speed, scalability, security, and cost. Champion engineering best practices for cloud computing, microservices architecture, API-first development, and event-driven systems. Ensure system reliability and uptime, leveraging SRE principles, observability tools, and incident management frameworks. Experience you’ll bring Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. 10+ years of experience in software engineering, with at least 3+ years in an engineering leadership role. Deep technical expertise in cloud computing platforms (AWS, Azure, or GCP) and cloud-native architectures. Strong coding and architectural skills in modern programming languages (Java, Kotlin, Go, Python, or similar). Experience leading Agile engineering teams, with a proven track record of delivering software using Scrum/Kanban methodologies. Hands-on experience with CI/CD pipelines, Kubernetes, container orchestration, and DevOps automation. Proven ability to scale engineering teams and optimize development workflows. Expertise in building distributed systems, microservices, and event-driven architectures. Deep understanding of API-first development, RESTful and GraphQL APIs, and secure API gateways. Strong experience in monitoring, logging, and observability tools (e.g., Prometheus, Datadog, ELK, Grafana, OpenTelemetry). Experience working in a fast-paced, high-growth, globally distributed environment. Ability to communicate complex technical topics to both engineering and non-engineering stakeholders. Experience in financial technology, mobility, or fleet management systems. (Preferred) Familiarity with global compliance and regulatory standards for payments and mobility. (Preferred) Background in real-time data processing, analytics, or AI-driven optimizations. (Preferred) Experience integrating machine learning models or AI-driven decision-making into engineering workflows. (Preferred) Show more Show less

Posted 1 month ago

Apply

12.0 years

0 Lacs

Thiruvananthapuram, Kerala, India

Remote

Linkedin logo

About The Company Armada is an edge computing startup that provides computing infrastructure to remote areas where connectivity and cloud infrastructure is limited, as well as areas where data needs to be processed locally for real-time analytics and AI at the edge. We’re looking to bring on the most brilliant minds to help further our mission of bridging the digital divide with advanced technology infrastructure that can be rapidly deployed anywhere . About The Role We are looking for a highly experienced and visionary Lead Golang Engineer to spearhead the architecture, design, and implementation of scalable backend systems. The ideal candidate has extensive experience with Golang, distributed systems, and microservices, along with proven leadership skills. You will lead a team of engineers, influence strategic technical decisions, and contribute to the success of key initiatives. Location. This role is office-based at our Trivandrum, Kerala office. What You'll Do (Key Responsibilities) Lead the design and development of complex, high-performance backend services using Golang. Define system architecture and best practices for scalable, secure, and maintainable code. Mentor and guide a team of backend engineers, conducting code reviews and promoting engineering excellence. Collaborate with product managers, architects, and other stakeholders to align engineering goals with business objectives. Drive DevOps best practices including CI/CD, observability, and incident management. Proactively identify technical risks and implement effective mitigation strategies. Stay current with industry trends and apply new technologies to improve system performance and developer productivity. Required Qualifications Bachelor’s or Master’s degree in Computer Science, Engineering, or related field. 12+ years of software development experience, with at least 5+ years of hands-on Golang development. Proven track record in building large-scale, distributed backend systems. Strong knowledge of microservices architecture, API design, and cloud-native applications. Experience with Docker, Kubernetes, and cloud platforms (AWS, GCP, or Azure). Proficiency with relational and NoSQL databases. Deep understanding of concurrency, performance optimization, and systems design. Strong communication skills and the ability to work cross-functionally. Preferred Experience And Skills Experience in leading geographically distributed engineering teams. Knowledge of event-driven architecture and tools like Kafka or RabbitMQ. Familiarity with observability tools like Prometheus, Grafana, and OpenTelemetry. Contributions to open-source Golang projects or community initiatives. Compensation & Benefits For India-based candidates: We offer a competitive base salary along with equity options, providing an opportunity to share in the success and growth of Armada. You're a Great Fit if You're A go-getter with a growth mindset. You're intellectually curious, have strong business acumen, and actively seek opportunities to build relevant skills and knowledge A detail-oriented problem-solver. You can independently gather information, solve problems efficiently, and deliver results with a "get-it-done" attitude Thrive in a fast-paced environment. You're energized by an entrepreneurial spirit, capable of working quickly, and excited to contribute to a growing company A collaborative team player. You focus on business success and are motivated by team accomplishment vs personal agenda Highly organized and results-driven. Strong prioritization skills and a dedicated work ethic are essential for you Equal Opportunity Statement At Armada, we are committed to fostering a work environment where everyone is given equal opportunities to thrive. As an equal opportunity employer, we strictly prohibit discrimination or harassment based on race, color, gender, religion, sexual orientation, national origin, disability, genetic information, pregnancy, or any other characteristic protected by law. This policy applies to all employment decisions, including hiring, promotions, and compensation. Our hiring is guided by qualifications, merit, and the business needs at the time. Show more Show less

Posted 1 month ago

Apply

15.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Linkedin logo

Splunk, a Cisco company, is building a safer and more resilient digital world with an end-to-end full stack platform made for a hybrid, multi-cloud world. Leading enterprises use our unified security and observability platform to keep their digital systems secure and reliable. Our customers love our technology, but it's our caring employees that make Splunk stand out as an amazing career destination. No matter where in the world or what level of the organization, we approach our work with kindness. So bring your work experience, problem-solving skills and talent, of course, but also bring your joy, your passion and all the things that make you, you. Come help organizations be their best, while you reach new heights with a team that has your back. Role Summary We’re looking for an outstanding Senior Principal Engineer to lead and drive technical strategy and execution for Splunk’s AppDynamics on-prem product line. In this role, you will work across teams to define architectural direction, solve sophisticated challenges, and deliver best-in-class solutions for our customers. Meet the Products and Technology Team Want to build security and observability products people love AND work with people as smart (and humble) as you are? Our team delivers digital resilience at enterprise scale with a self-service Splunk portfolio that offers unified security analytics, full stack observability and real-time visibility of streaming data. Learn more about the team, meet our leaders, and hear from Splunk technologists and engineers at splunk.com/careers/products-and-technology. What you'll get to do Lead the architectural vision and technical strategy for an industry leading observability solution. Drive innovation and influence the technical direction of key product initiatives. Collaborate with multi-functional teams, including product management, UX, and operations, to define requirements and deliver solutions. Serve as a mentor and technical leader, guiding teams and fostering a culture of engineering excellence. Optimize system performance, scalability, and reliability to meet customer needs. Engage with customers and stakeholders to understand use cases and feedback, translating them into actionable insights. Stay current with industry trends, identifying opportunities to incorporate groundbreaking technologies. Must-have Qualifications Strong fundamentals in software engineering data structures, algorithms, distributed concurrency control, consistency models, etc. Demonstrated expertise in designing and building enterprise-grade on-prem software solutions. Proficiency in programming languages such as Java, C++, or Go. Proven experience with application performance monitoring (APM) tools or similar observability technologies. Strong understanding of infrastructure, networking, and security concerns in on-prem environments. Track record of leading cross-functional teams and influencing stakeholders at all levels. Excellent communication and collaboration skills. Bachelor of Science in Computer Science with 15+ years of related experience or Masters and 12 + years of related experience or PhD and 8+ years of related experience Nice-to-have Qualifications We’ve taken special care to separate the must-have qualifications from the nice-to-haves. “Nice-to-have” means just that Nice. To. Have. So, don’t worry if you can’t check off every box. We’re not hiring a list of bullet points–we’re interested in the whole you. Experience with AppDynamics or similar APM platforms. Familiarity with cloud-native and hybrid deployment models. Knowledge of containerization technologies (e.g., Docker, Kubernetes). Hands-on experience with observability tools like OpenTelemetry. Background in designing products for high-compliance industries (e.g., finance, healthcare). Contributions to open-source projects or technical community involvement. Splunk is an Equal Opportunity Employer Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Note Show more Show less

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies