The Connection Between Information Theory and the Geometry of High Dimensions

Let us revisit the geometry of objects in higher dimensions.

Suppose we want to understand how points behave in a $d$ -dimensional space.
One useful way to study this is through Gaussian distributions, which play a central role in probability, statistics, and information theory.

Consider a vector

x = (x_1, x_2, \dots, x_d)

where each component is drawn independently from a standard Gaussian distribution

x_i \sim \mathcal{N}(0,1)

This means every coordinate has mean $0$ and variance $1$ .

Distance from the origin

To understand where these points lie, we examine the distance of the vector from the origin.
The Euclidean norm of $x$ is

\|x\| = \sqrt{\sum_{i=1}^{d} x_i^2}

Working with the squared norm makes the analysis simpler:

\|x\|^2 = \sum_{i=1}^{d} x_i^2

Now let us compute its expectation.

Because expectation is linear,

\mathbb{E}[\|x\|^2] = \mathbb{E}\left[\sum_{i=1}^{d} x_i^2\right] = \sum_{i=1}^{d} \mathbb{E}[x_i^2]

For a standard Gaussian variable,

\mathbb{E}[x_i^2] = 1

Therefore,

\mathbb{E}[\|x\|^2] = d

This means the expected squared distance from the origin grows linearly with the dimension.

Taking the square root suggests that the typical distance of a randomly generated point is roughly

\|x\| \approx \sqrt{d}

A surprising phenomenon

At first glance this may not seem surprising.
However, something deeper happens in high dimensions.

As the dimension increases, the probability mass of the Gaussian distribution concentrates in a thin shell around radius $\sqrt{d}$ .

In other words, most randomly generated points satisfy

\|x\| \in [\sqrt{d} - O(1), \sqrt{d} + O(1)]

with very high probability.

So even though the space becomes larger as $d$ grows, the points do not spread everywhere.
Instead, they gather in a narrow band at distance about $\sqrt{d}$ from the origin.

This phenomenon is known as concentration of measure.

Relation to high-dimensional geometry

Now consider the unit sphere

\|x\| = 1

in $d$ dimensions.

Compared to the Gaussian shell around $\sqrt{d}$ , this sphere lies extremely close to the origin relative to where most Gaussian points appear.

As the dimension increases, the volume of the unit ball

B_d = \{x : \|x\| \le 1\}

shrinks rapidly.
In fact, the volume is

V_d = \frac{\pi^{d/2}}{\Gamma(d/2 + 1)}

and it approaches zero as $d \to \infty$ .

This illustrates one of the most striking aspects of high-dimensional geometry:
most of the “space” moves away from the center.

Why this matters in information theory

Gaussian distributions are central in information theory because they maximize entropy under variance constraints.
As a result, many high-dimensional probabilistic systems behave similarly to Gaussian vectors.

The shell phenomenon around $\sqrt{d}$ also appears in the concept of typical sets, where most probability mass concentrates in a small region of the space.

Understanding this geometric structure helps explain why high-dimensional probability behaves so differently from our low-dimensional intuition.

High-dimensional spaces often appear mysterious at first.
Yet through simple probabilistic arguments, we begin to see an elegant geometric pattern emerge:
as the dimension grows, randomness organizes itself into surprisingly structured forms.