RL4LLM: 1. A Brief Talk on DPO

This blog post notes my understanding of Direct Preference Optimization and the math derivation behind it.

Jul-12-2025 · Last updated on Oct-11-2025 · 2 min · 721 words · Kosmo CHE