Large Language Model

RL4LLM: 3. Information Theory in Reasoning

This blog post discusses the role of information theory in LLM reasoning.

This blog post introduces RLHF-PPO algorithm with code implementation.

This blog post notes my understanding of Direct Preference Optimization and the math derivation behind it.