Deep Dive into Fine Tuning a LoRA Reranker on Phi-3
I fine tuned Phi-3 as a pairwise reranker with LoRA and logged every gradient. Early layers changed 200x more than late layers, but ranking representations o...
I fine tuned Phi-3 as a pairwise reranker with LoRA and logged every gradient. Early layers changed 200x more than late layers, but ranking representations o...
Auto regressive generation is sequential and diffusion uses much fewer passes in text generation.
Cross-entropy loss isn’t a heuristic, it is maximum likelihood estimation with a sign flip. It also shows how the same math powers GPT training.
Understanding the basics of RLHF vs RLAIF vs RLVR for AI feedback comparison