projects

Projects

Pluto

Built a no-code web based machine learning trainer that lets users upload CSVs, select algorithms, visualization and train models end-to-end.

live link ↗

Adaptive-ViT

An adaptive Vision Transformer inference system that avoids unnecessary high-resolution computation, achieving ~3× faster inference than static high-res ViT by selectively escalating only when needed.

Kernel-Fusion

Fused ReLU + LayerNorm into a single CUDA RawKernel which is 5.8x faster than running them separately, A CUDA Python experiment demonstrating kernel fusion by combining ReLU and LayerNorm into a single GPU pass and comparing it against the unfused multi-kernel pipeline.

FlashAttention-CuPy

Flash Attention from scratch, tiled CUDA forward kernel, online softmax with running max and correction factor, recomputation trick in backward, O(N) memory, full forward and backward verified against PyTorch autograd to 1e-6.

VecEngine

A lightweight vector database, retrieval engine, and custom indexer, all built completely from scratch.

customGPT-RAG

custom RAG system that uses local document embeddings and generative AI to provide accurate, context-aware answers from private knowledge.

Autograd-NN

Implemented a fully connected autograd engine and neural network from scratch in pure Python.

micrograd-in-cpp

andrej karpathy's micrograd python implementation in c++

Deep Learning Framework

building custom deep learning framework from scratch

work in progress

← back