Jobs

Interviews
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Jobs

Home
>
Jobs in Chennai
>
Constient Global Solutions
>
LLM Systems Engineer

LLM Systems Engineer

Constient Global Solutions

0 years

|

0 Lacs

Chennai Tamil Nadu India

Posted:2 weeks ago| Platform:

Apply

Skills Required

latency inference tuning model routing logic decoding efficiency tooling parallelism management cuda quantization python docker gcp aws testing

Work Mode

On-site

Job Type

Contractual

Job Description

Overview: Seeking an engineer to build and optimize high-throughput, low-latency LLM inference infrastructure using open-source models (Qwen, LLaMA, Mixtral) on multi-GPU systems (A100/H100). You’ll own performance tuning, model hosting, routing logic, speculative decoding, and cost-efficiency tooling. Must-Have Skills: Deep experience with vLLM, tensor/pipe parallelism, KV cache management Strong grasp of CUDA-level inference bottlenecks, FlashAttention2, quantization Familiarity with FP8, INT4, speculative decoding (e.g., TwinPilots, PowerInfer) Proven ability to scale LLMs across multi-GPU nodes (TP, DDP, inference routing) Python (systems-level), containerized deployments (Docker, GCP/AWS), load testing (Locust) Bonus: Experience with any-to-any model routing (e.g., text2sql, speech2text) Exposure to LangGraph, Triton kernels, or custom inference engines Has tuned models for <$0.50/M token inference at scale Highlight: Very good rate card for the best candidate fit. Show more Show less

More Jobs at Constient Global Solutions

B.Com - Interns with Tally/SAP Knowledge

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

LLM Systems Engineer

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Senior Account/Female/Chennai

Chennai

INR 2 - 7 Lacs

AI Solution Architect - Agentic Intelligence

Chennai, Tamil Nadu, India

Experience: Not specified

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Latency Interview Now

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Google Play

App Store

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Constient Global Solutions

4 Jobs

RecommendedJobs for You

LLM Systems Engineer

Constient Global Solutions

Chennai, Tamil Nadu, India

LLM Systems Engineer

Constient Global Solutions

Chennai, Tamil Nadu, India