RL4LLM: 1. A Brief Talk on DPO
This blog post notes my understanding of the algorithm DPO (Direct Preference Optimization) and the math derivation behind it.
This blog post notes my understanding of the algorithm DPO (Direct Preference Optimization) and the math derivation behind it.