RL4LLM: 3. Information Theory in Reasoning

This blog post discusses the role of information theory in LLM reasoning.

Oct-10-2025 · Last updated on Oct-11-2025 · 1 min · 170 words · Kosmo CHE

RL4LLM: 2. PPO Algorithm and Implementation Details

This blog post introduces RLHF-PPO algorithm with code implementation.

Jul-21-2025 · Last updated on Oct-11-2025 · 2 min · 703 words · Kosmo CHE

Zotero

This blog note how to use Zotero with iCloud as the storage.

Jul-13-2025 · Last updated on Oct-11-2025 · 1 min · 45 words · Kosmo CHE

RL4LLM: 1. A Brief Talk on DPO

This blog post notes my understanding of Direct Preference Optimization and the math derivation behind it.

Jul-12-2025 · Last updated on Oct-11-2025 · 2 min · 721 words · Kosmo CHE