Projects
Pluto
Built a no-code web based machine learning trainer that lets users upload CSVs, select algorithms, visualization and train models end-to-end.
Adaptive-ViT
An adaptive Vision Transformer inference system that avoids unnecessary high-resolution computation, achieving ~3× faster inference than static high-res ViT by selectively escalating only when needed.
Kernel-Fusion
Fused ReLU + LayerNorm into a single CUDA RawKernel which is 5.8x faster than running them separately, A CUDA Python experiment demonstrating kernel fusion by combining ReLU and LayerNorm into a single GPU pass and comparing it against the unfused multi-kernel pipeline.
FlashAttention-CuPy
Flash Attention from scratch, tiled CUDA forward kernel, online softmax with running max and correction factor, recomputation trick in backward, O(N) memory, full forward and backward verified against PyTorch autograd to 1e-6.
VecEngine
A lightweight vector database, retrieval engine, and custom indexer, all built completely from scratch.
customGPT-RAG
custom RAG system that uses local document embeddings and generative AI to provide accurate, context-aware answers from private knowledge.
Autograd-NN
Implemented a fully connected autograd engine and neural network from scratch in pure Python.
micrograd-in-cpp
andrej karpathy's micrograd python implementation in c++
Deep Learning Framework
building custom deep learning framework from scratch
work in progress
← back