i'm mohit, i like to build things from scratch and work with llm inference — working with cuda, triton, and gpu kernel programming.
i've built a few projects along the way: triton-based speculative decoding for qwen3 on amd mi300x, a cuda inference engine for gpt models from scratch, dynamic vision transformer inference, and more.