MOHIT

i'm mohit, i like to build things from scratch and work with llm inference — working with cuda, triton, and gpu kernel programming.

i've built a few projects along the way: triton-based speculative decoding for qwen3 on amd mi300x, a cuda inference engine for gpt models from scratch, dynamic vision transformer inference, and more.

Experience

deep learning engineer — offtoning jan 2026 — mar 2026

Built a custom deep learning framework from scratch in Python with NumPy and CuPy backends supporting tensor ops, autograd, and GPU-accelerated computation
Implemented neural network modules, optimizers, mathematical kernels, and training pipelines for image and video editing models
Developed end-to-end model training, inference, and experimentation infrastructure for rapid deployment workflows