RL4LLM: 1. A Brief Talk on DPOThis blog post notes my understanding of Direct Preference Optimization and the math derivation behind it.