1. Introduction

Recently there has been a growing interest in the dynamic features in the reasoning process of Large Language Models (LLMs). While LLMs have demonstrated impressive capabilities in various tasks, their reasoning abilities often lack the depth and adaptability seen in human cognition. This blog post explores the recent advancements in understanding and enhancing the reasoning processes of LLMs, with a particular focus on the application of information theory especially the concept of entropy.

2.1. Information Content and Entropy

For a discrete random variable $X$ with possible outcomes $\{x_1, x_2, \ldots, x_n\}$ and corresponding probabilities $P(X = x_i) = p_i$, the Information Content $I(x_i)$ of an outcome $x_i$ is defined as:

$$ I(x_i) = -\log(p_i) $$

The Entropy $H(X)$ of the random variable $X$ is defined as the expected value of the Information Content:

$$ \begin{aligned} H(X) = \mathbb{E}[I(X)] = -\sum_{i=1}^{n} p_i \log(p_i) \end{aligned} $$

It quantifies the uncertainty or unpredictability associated with the random variable $X$. Higher entropy indicates greater uncertainty, while lower entropy indicates more predictability.

And the Varentropy $V(X)$ of the random variable $X$ is defined as the variance of the Information Content:

$$ \begin{aligned} V(X) &= \mathbb{E}[I(X)^2] - (\mathbb{E}[I(X)])^2 \\ &= \sum_{i=1}^{n} p_i \left( -\log(p_i) \right)^2 - \left( -\sum_{i=1}^{n} p_i \log(p_i) \right)^2 \end{aligned} $$

Varentropy measures the variability or spread of the information content around its mean (entropy). A higher varentropy indicates that the information content of different outcomes varies significantly, while a lower varentropy suggests that the information content is more consistent across outcomes.