We are building a next-generation AI Compiler and Runtime Stack targeting a QEMU-emulated RISC-V-based NPU architecture.
You will work on cutting-edge open-source technologies like MLIR, IREE, and LLVM, designing advanced compiler flows for scalar, vector (RVV), and matrix multiplication cores powering real-world Automotive, Robotics, and AR/VR AI workloads.
This is a rare opportunity to work on an end-to-end AI system from model ingestion (PyTorch, TensorFlow, ONNX) through optimized codegen and runtime deployment.
Why Join Us?
Build a full-stack AI platform from the ground up.
Work on next-generation RISC-V AI accelerators.
Opportunity to contribute upstream to MLIR, LLVM, and IREE.
Responsibilities:
Develop custom MLIR dialects, passes, and transformations for optimizing AI models.
Extend the LLVM RISC-V backend to support new instructions.
Integrate hand-optimized ukernels with IREE for critical AI operations (matmul, convolution, reductions).
Build compiler optimizations for: Loop unrolling, fusion, and vectorization, Memory access optimization (SRAM-aware).
Lower AI models from high-level ML frameworks through MLIR to LLVM IR, generating RISC-V assembly.
Enhance IREEs codegen and runtime scheduling for scalar, vector, and matrix cores.
Collaborate with hardware architects, runtime developers, and QEMU platform engineers.
Qualifications:
Prior work with AI/ML workloads (Vision Transformers, Object Detection, ASR, etc.).
Familiarity with ping-pong buffering in SRAM and optimizing for memory bandwidth.
Contribution to open-source projects in MLIR, LLVM, or IREE.
Understanding of AI model optimization techniques (quantization, tiling, scheduling).
Experience with profiling and debugging low-level systems.