Job
Description
Location: Gurugram (Work From Office) Job Type: Full-Time Job Description: As a Senior Machine Learning Engineer , you will be responsible for designing, developing, and deploying cutting-edge models for end-to-end content generation , including AI-driven image/video generation, lipsyncing, and multimodal AI systems . You will work on the latest advancements in deep generative modeling to create highly realistic and controllable AI-generated media. Responsibilities: Research & Develop : Design and implement state-of-the-art generative models , including Diffusion Models, 3D VAEs and GANs for AI-powered media synthesis . End-to-End Content Generation : Build and optimize AI pipelines for high-fidelity image/video generation and lipsyncing using diffusion and autoencoder models. Speech & Video Synchronization : Develop advanced lipsyncing and multimodal generation models that integrate speech, video, and facial animation for hyper-realistic AI-driven content. Real-Time AI Systems : Implement and optimize models for real-time content generation and interactive AI applications using efficient model architectures and acceleration techniques . Scaling & Production Deployment : Work closely with software engineers to deploy models efficiently on cloud-based architectures (AWS, GCP, or Azure) . Collaboration & Research : Stay ahead of the latest trends in deep generative models, diffusion models, and transformer-based vision systems to enhance AI-generated content quality. Experimentation & Validation : Design and conduct experiments to evaluate model performance, improve fidelity, realism, and computational efficiency , and refine model architectures. Code Quality & Best Practices : Participate in code reviews, improve model efficiency, and document research findings to enhance team knowledge-sharing and product development . Qualifications: Bachelor's or Masters degree in Computer Science, Machine Learning, or a related field. 3+ years of experience working with deep generative models , including Diffusion Models, 3D VAEs, GANs and autoregressive models . Strong proficiency in Python and deep learning frameworks such as PyTorch. Expertise in multi-modal AI, text-to-image, and image-to-video generation , audio to lipsync Strong understanding of machine learning principles and statistical methods. Good to have experience in real-time inference optimization, cloud deployment, and distributed training . Strong problem-solving abilities and a research-oriented mindset to stay updated with the latest AI advancements. Familiarity with generative adversarial techniques, reinforcement learning for generative models, and large-scale AI model training . Preferred Qualifications: Experience with transformers and vision-language models (e.g., CLIP, BLIP, GPT-4V). Background in text-to-video generation, lipsync generation and real-time synthetic media applications . Experience in cloud-based AI pipelines (AWS, Google Cloud, or Azure) and model compression techniques (quantization, pruning, distillation) . Contributions to open-source projects or published research in AI-generated content, speech synthesis, or video synthesis .