Fulltime Opportunity - MLOps Engineer World Wide Technology

1.0 - 6.0 years

18 - 33 Lacs

Gurugram

Hybrid

RESPONSIBILITIES: Develop, productionize, and deploy scalable, resilient software solutions for operationalizing AI & ML. Deploy Machine Learning (ML) models and Large Language Models (LLM) securely and efficiently, both in the cloud and on-premises, using state of the art platforms, tools, and techniques. Provide effective model observability, monitoring, and metrics by instrumenting logging, dashboards, alerts, etc. In collaboration with Data Engineers, design and build pipelines for extraction, transformation, and loading of data from a variety of data sources for AI & ML models as well as RAG architectures for LLMs. Enable Data Scientists to work more efficiently by providing tools for experiment tracking and test automation. Ensure scalability of built solutions by developing and running rigorous load tests. Facilitate integration of AI & ML capabilities into user experience by building APIs, UIs, etc. Stay current on new developments in AI & ML frameworks, tools, techniques, and architectures available for solution development, both private and open source. Coach data scientists and data engineers on software development best practices to write scalable, maintainable, well-designed code. Agile Project Work Work in cross-functional agile teams of highly skilled software/machine learning engineers, data scientists, DevOps engineers, designers, product managers, technical delivery teams, and others to continuously innovate AI and MLOps solutions. Act as a positive champion for broader organization to develop stronger understanding of software design patterns that deliver scalable, maintainable, well-designed analytics solutions. Advocate for security and responsibility best practices and tools. Acts as an expert on complex technical topics that require cross-functional consultation. Perform other duties as required. QUALIFICATIONS: Experience applying continuous integration/continuous delivery best practices, including Version Control, Trunk Based Development, Release Management, and Test-Driven Development Experience with popular MLOps tools (e.g., Domino Data Labs, Dataiku, mlflow, AzureML, Sagemaker), and frameworks (e.g.: TensorFlow, Keras, Theano, PyTorch, Caffe, etc.) Experience with LLM platforms (OpenAI, Bedrock, NVAIE) and frameworks (LangChain, LangFuse, vLLM, etc.) Experience in programming languages common to data science such as Python, SQL, etc. Understanding of LLMs, and supporting concepts (tokenization, guardrails, chunking, Retrieval Augmented Generation, etc.). Knowledge of ML lifecycle (wrangling data, model selection, model training, modeling validation and deployment at scale) and experience working with data scientists Familiar with at least one major cloud provider (Azure, AWS, GCP), including resource provisioning, connectivity, security, autoscaling, IaC. Familiar with cloud data warehousing solutions such as Snowflake, Fabric, etc. Experience with Agile and DevOps software development principles/methodologies and working on teams focused on delivering business value. Experience influencing and building mindshare convincingly with any audience. Confident and experienced in public speaking. Ability to communicate complex ideas in a concise way. Fluent with popular diagraming and presentation software. Demonstrated experience in teaching and/or mentoring professionals.

Posted 3 weeks ago

Apply

Software Engineer, Automation | Bangalore Aeries Technology

8.0 - 13.0 years

20 - 35 Lacs

Bengaluru

Hybrid

Summary Penguin Computing is seeking a software engineer with a background in Software Automation to join our Software group. Penguin Computing's Scyld Software products are used in the deployment, provisioning, management, and monitoring of some of the largest computational systems in the world. In this role, you will collaborate closely with Technical Architects, Software Engineers, Product Owners and Managers, and Services engineering teams to develop a new product that delivers Software Automation capabilities and all phases of Infrastructure Management to end customers, particularly in AI space. We intend to take Infrastructure-as-code principles to their fullest potential. As part of a talented and high-performing agile team, you will have the opportunity to make lasting impacts on our software and our customers. The ideal candidate has an excellent understanding of computer infrastructure lifecycle from bare metal through to fully operational and ready for users. You will understand the challenges faced by scaling complex systems and networks. You will be a creative thinker willing to be experimental but always maintaining the highest engineering rigor. The team is distributed; we are looking for team members who perform well given a high degree of independence and autonomy and can communicate effectively asynchronously. Essential Duties and Responsibilities Solid command on any of the programming languages like Java, Python, C, C++. Create, maintain, and improve Ansible playbooks and other code that manage Linux-based high-performance computer (HPC) and artificial intelligence (AI) environments Write well-formulated, highly readable code and support tests and documentation Participate in team workflow: stand-ups, code reviews, design discussions, research and report backs Evaluate new business requirements and write technical specifications Work within the team on continuous improvement: mentoring junior engineers, knowledge-sharing, and improving our internal processes Partner with field engineers on troubleshooting and remediation Keep abreast of developments on the Infrastructure Management frontier. Job Knowledge, Skills, and Abilities Bachelors degree in computer science/engineering or similar discipline or equivalent experience Deep understanding & experience in Software Automation Experience with bare metal provisioning: PXE and kickstart Experience with monitoring tools and strategies Excellent understanding of Linux-based systems including system administration Deep understanding and experience with configuration management tooling and processes like Ansible Solid coding skills including at least one scripting language and solid understanding of data structures Experience with Git and CI/CD tooling and practices Knowledge of Security best practices and technologies Knowledge of Nvidia GPU ecosystem (architecture, drivers, etc) Practical knowledge of HPC technologies including cluster management and stack Ability to communicate technical designs and concepts clearly and effectively Understanding of network technologies, architectures, and protocols Experience with virtualization architecture and platforms is preferred Experience with container-based software deployment and orchestration using Kubernetes.

Posted 1 month ago

Apply

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.