RL4LLM: 3. Information Theory in Reasoning
This blog post discusses the role of information theory in LLM reasoning.
This blog post discusses the role of information theory in LLM reasoning.
This blog post introduces RLHF-PPO algorithm with code implementation.
This blog post notes my understanding of Direct Preference Optimization and the math derivation behind it.