RL4LLM: 2. PPO Algorithm and Implementation Details
This blog post introduces RLHF-PPO algorithm with code implementation.
This blog post introduces RLHF-PPO algorithm with code implementation.
This blog note how to use Zotero with iCloud as the storage.
This blog post notes my understanding of the algorithm DPO (Direct Preference Optimization) and the math derivation behind it.