|
简体中文

Home
About Me
Blogs
Categories
Tags
Archive
Search
Friends

RL4LLM

RL4LLM: 1. A Brief Talk on DPO

This blog post notes my understanding of the algorithm DPO (Direct Preference Optimization) and the math derivation behind it.

Jul-12-2025 · Last updated on Aug-26-2025 · 2 min · 721 words · Kosmo CHE

© 2024-2025 Kosmo CHE · Powered by Hugo & PaperMod