Prerequirements:

- Elements of asymptotic analysis
- The limit inferior and the limit superior of a sequence
- Properties of asymptotic orders
- The functions W and V
- The integral mean value and the absolute error function R
- A consequence of Selberg’s Theorem, in integral form

With this post we’ll conclude the main part of the Prime Number Theorem proof, which is based on the relationship between the constants \alpha and \beta which have been introduced in the post The integral mean value and the absolute error function R. They are equal to, respectively, the limit superior of the absolute value of the function V and the limit superior of the integral mean of the same absolute value. By Lemma N.7, in order to prove the Prime Number Theorem it’s sufficient to prove that \alpha = 0: in this post we’ll approach this result.

## The two parts of the Prime Number Theorem proof

First of all, let’s recall the definitions of the constants \alpha and \beta (Definition N.14):

where x, as it’s always implied in our posts, is a variable which assumes only integer values (in this case only positive integer values).

As usually happens in number theory, the techniques we’ll use will require the passage from integer numbers to real numbers. So we’ll define two new constants, which we’ll call \alpha^{\prime} and \beta^{\prime}, corresponding to \alpha and \beta, where the variable of the limit superior assumes values inside a real interval:

Constants \alpha^{\prime} and \beta^{\prime}

Given the variable \xi which can assume values inside the real interval [1, +\infty), we define the following constants:

where for the real variable we used the symbol \xi, the Greek “x”, for remembering that it corresponds to the integer variable x of the definition of \alpha and \beta.

So, differently from \alpha, in order to compute \alpha^{\prime}, all values of the function f(\xi) := |V(\log \xi)| for \xi \in [1, +\infty) need to be considered, not just the integer values of \xi (that are the values of x), i.e. not only |V(\log 1)|, |V(\log 2)|, |V(\log 3)|, \ldots. Similarly, in order to compute \beta^{\prime} all the values of the function g(\xi) := \frac{1}{\log \xi} \int_0^{\log \xi} |V(u)|\ du for \xi \in [1, +\infty) need to be considered, not only \frac{1}{\log 1} \int_0^{\log 1} |V(u)|\ du, \frac{1}{\log 2} \int_0^{\log 2} |V(u)|\ du, \frac{1}{\log 3} \int_0^{\log 3} |V(u)|\ du, \ldots. So we are calculating superior limits not of sequences, but of functions. In one of our previous posts we saw what is the idea behind the definition of the limit superior of a sequence, and we stated a formal definition of it. Now we could do the same for the limit superior of a function, showing the differences with respect to the case of a sequence; however we’ll state just one essential property, which will let us go on and achieve our goals. This property is that: if we have a real function h defined on a right unbounded real interval I, and if we consider a sequence obtained by choosing a countable subset of the function values, that is a sequence of the kind h(a_1), h(a_2), \ldots, h(a_n), \ldots, with a_1, a_2, \ldots, a_n, \ldots \in I, then the limit superior of the function is greater or equal to the one of the sequence. By applying this property to the case of \alpha^{\prime}, we’ll have that I = [1, +\infty), h(\xi) = f(\xi) = |V(\log \xi)| and we’ll choose all function values obtained for integer values of \xi (i.e. h(a_1), h(a_2), \ldots, h(a_n), \ldots = h(1), h(2), \ldots, h(n), \ldots = |V(\log 1)|, |V(\log 2)|, \ldots, |V(\log n)|, \ldots). According to this property, the limit superior of the function f(\xi) = |V(\log \xi)| is greater or equal to the one of the sequence |V(\log 1)|, |V(\log 2)|, \ldots, |V(\log n)|, \ldots. But we called \alpha^{\prime} the former, while we called \alpha the latter, hence:

You can note that we haven’t said what the limit superior of a function is; we just said that it’s greater or equal to the limit superior of any sequence obtained by calculating the function on a countable subset of its domain (by the way, such sequences are called *subsequences*). Indeed, from here to the definition it’s a short step, because the limit superior of a function can be defined as *the smallest* real number which is greater or equal to the superior limits of *all* the subsequences of the function. But for the moment this definition is of little interest for us, because we need only to know that \alpha^{\prime} \geq \alpha.

Of course we can repeat the same argument for \beta and \beta^{\prime}, obtaining that:

Looking at these properties by the perspective of \alpha and \beta, we can state the following Property:

Relationship between \alpha and \alpha^{\prime}, and between \beta and \beta^{\prime}

With reference to Definition N.23 (Constants \alpha^{\prime} and \beta^{\prime}):

In this post we’ll prove the following Proposition:

Relationship between \alpha and \beta^{\prime}

With reference to Definitions N.23 and N.14:

We have to remember that our goal, by Lemma N.7, is to prove that \alpha = 0. In the next posts we’ll use Proposition N.23 for proving it by contraddiction: we’ll suppose that \alpha \gt 0 and from that we’ll deduce that \beta^{\prime} \lt \alpha; but this is in contraddiction with Proposition N.23, so the hypothesis that \alpha \gt 0 must be false, i.e. \alpha must be zero (it cannot be negative, since it’s the limit superior of a non-negative function).

But why complicating things with \beta^{\prime}, instead of proving that \alpha = 0 directly? Indeed, as we’ll see in the next posts, it’s relatively simple to prove the implication \alpha \gt 0 \Rightarrow \beta^{\prime} \lt \alpha, but so far nobody has been able to prove that \alpha = 0 in a more direct way, without resorting to \beta^{\prime}. The introduction of \beta^{\prime} is one of the key ideas of the whole proof of the Prime Number Theorem, not only because in actual fact it’s the only known way for proving that \alpha = 0, but also because it lets break the proof into two main parts:

- The proof that \alpha \leq \beta^{\prime}: it’s the long and complicated part which ends with this post;
- The proof of the implication \alpha \gt 0 \Rightarrow \beta^{\prime} \lt \alpha: it’s the relatively simple part which we’ll treat in the next posts.

In the next section we’ll see the proof of Proposition N.13.

Taking into account also \alpha^{\prime} and remembering that \alpha \leq \alpha^{\prime}, starting from Proposition N.23 we could suppose that:

Or alternatively, remembering that \beta \leq \beta^{\prime}, we could suppose that:

So, while keeping the goal of second part of the proof unchanged, that is \alpha \gt 0 \Rightarrow \beta^{\prime} \lt \alpha, two alternative goals for the first part may be to prove that \alpha^{\prime} \leq \beta^{\prime}, or respectively that \alpha \leq \beta. We don’t know if these relationships are true but, since it’s not necessary to prove them in order to prove the Prime Number Theorem, this is not a big problem.

You could think that the definition of \beta is substantially useless, because the constants of interest are after all \alpha and \beta^{\prime}: it would have been sufficient to define then directly from the beginning.

This remark is correct, however we introduced initially \beta because we prefer to work with integer numbers as far as possible: this way we can arrive to some points such that, in order to overcome them, we need to pass to real numbers. We think that these situations are very interesting, because they give the possibility to appreciate the value of real analysis, answering to the question that often some people ask: “why for proving some simple things sometimes is it necessary to use sophisticated mathematical techniques?”.

## The proof that \alpha \leq \beta^{\prime}

Let’s start from Proposition N.12, that we proved in the previous post:

Let’s compare this inequality with our goal, \alpha \leq \beta^{\prime}, which we can write more explicitly using Definitions N.23 and N.14:

Clearly, in order to pass from (1) to (2), we have to replace somehow the function R with the function V. As we noted in the post The integral mean value and the absolute error function R near formula (11), the two functions are related each other by the following relationship which is true for all u \in [0, +\infty):

We can let this function appear inside the integral at the right member of (1), by the substitution t := \frac{x}{e^u}:

First of all, we can note that the substitution t := \frac{x}{e^u} is suitable because, applying it to the expression R\left(\frac{x}{t}\right) which appears inside the integral, we’ll obtain R\left(x / \frac{x}{e^u}\right) = R\left(\frac{x e^u}{x}\right) = R(e^u), which is just the numerator of (3). So this substitution let us pass from the function R to the function V.

However, when doing so, we have to consider that the new integration variable is u, hence we have to express dt as a function of du and we have to recalculate the integration ends:

- From t := \frac{x}{e^u} we can obtain dt = d\left(\frac{x}{e^u}\right) = x d\left(\frac{1}{e^u}\right) = x \left(- \frac{1}{e^{2u}} d(e^u) \right) = x \left(- \frac{1}{e^{2u}} e^u du \right) = - x \frac{1}{e^{u}} du
- If t = 1, we’ll have that \frac{x}{e^u} = t = 1. Since x \neq 0, we can divide by x, obtaining \frac{1}{e^u} = \frac{1}{x}. Since both members are different from zero, we’ll have e^u = x, that is u = \log x. So the new lower end of integration will be \log x.
- If t = x, we’ll have that \frac{x}{e^u} = t = x, hence \frac{1}{e^u} = 1 \Rightarrow e^u = 1 \Rightarrow u = 0. So the new upper end of integration will be 0.

Thus we’ll have:

The constant x can be brought outside of the integral, while the minus sign can be cancelled by exchanging the integration ends (thus making the lower end, as usual, lower than the upper end):

Joining the two previous formulas we’ll obtain the passage (*).

First of all we can note that \log x - u = \int_u^{\log x} dv. This way we can convert the initial integral into a double integral:

Since the term |V(u)| does not depend on v, we can bring it inside the inner integral:

Now let’s change the order of integration. We’ll adopt an intuitive approach, based on the idea of treating the integral as a summation (this idea is not so far from reality, because the integral is defined as the limit of a summation, but it would require to be formalized).

The integral on the right can be read as follows: “Let u vary between 0 and \log x; for each value of u let v vary between u and \log x; for each resulting couple (u, v) compute |V(u)|; finally sum up everything”. Now let’s focus on the possible couples (u, v). We said how to generate these couples by fixing first a value for u and then a value for v, but what if we did the converse?

We said that v varies between u and \log x, meaning that it assumes, for each value of u, all the values between u and \log x (from now on, we’ll always mean that when we’ll use the expression “varies between”). But u varies between 0 and \log x, so in particular, if u = 0, v varies between 0 and \log x (for values of u greater than zero, v will still assume values in this range, but it won’t assume all of them, because the minimum value assumed will be u \gt 0). This remark is important because it tells us that, for any c \in [0, \log x], there exists at least one couple (u, v) with that value for v: it’s sufficient to set u := 0 and v := c. For checking it, it’s sufficient to note that the couple (u, v) := (0, c) satisfies the constraints of the integration variables, i.e. 0 \leq u \leq \log x and u \leq v \leq \log x.

Now let’s make a further step. After fixing a value for the variable v \in [0, \log u], let’s see what are *all and only* the values of u such that the couple (u, v) exists (given that, as we have just seen, u = 0 is one of such values). Generally, since v varies between u and \log x, we must have that u \leq v. This is the highest possible value for u. The minimum one is certainly 0 because, as we have seen previously, u := 0 is a possible value for any value of v, and u cannot be negative. So, after fixing a value for v, the range of possible values for u is [0, v]: there are no couples of the kind (u, v) such that u is outside this range. So, if we consider the integral as a sum extended to all the couples (u, v), we can rephrase it as follows: “Let v vary between 0 and \log x; for each value of v let u vary between 0 and v; for each resulting couple (u, v) compute |V(u)|; finally sum up everything”. This corresponds to the following formula:

But the set of the couples (u, v) which have been considered this way is the same as before, because for each possible values of v we have found *all and only* the corresponding values of u, so we have simply looked at the same couples in a different order. By this principle (which is true for summations, but for integrals it’s only an intuitive explanation), we’ll have that:

This formula, along with the previous ones, explains the passage (**).

The last expression inside parentheses is formally very similar to the argument of the second \limsup in (2); so, in order to reduce that expression to (2), we have to introduce the \limsup. Before doing that, we have to know that generally a function f with positive real values, defined on an upper unbounded set, can be overestimated by means of the relationship f \leq (\limsup f) + o(1), i.e. there exists a function h such that f \leq (\limsup f) + h, and h \to 0. We won’t prove this property, but we can understand it intuitively, at least in the case of sequences: using the terminology of the post The limit inferior and the limit superior of a sequence, the function h tells how much the values of the function exceed its limit superior (the difference between them and the limit superior must tend to zero because, it wasn’t so, those values would be enough to generate a limit superior greater than the actual one, which would be absurd). Applying this concept to the function f(v) := \frac{1}{v} \int_0^v |V(u)|\ du, we can continue the development of (4) as follows:

In the last limit superior, the variable v assumes all the real values in the range [0, +\infty). Hence if we set \xi := e^v (from which v = \log \xi), we’ll have that \xi assumes all the real values in the range [1, +\infty), exactly like the \xi of Definition N.23; in addition if v = \log \xi \to \infty, then also \xi \to \infty. So:

Thus, substituting into (5), we’ll have:

where in the last passage we have applied Corollary 1 of Property A.8, which can be proved for small ohs as well as for big Ohs.

Doing some calculations, the last expression of (6) can be simplified as follows:

By the properties of integrals:

Let’s develop the first integral:

The second integral can be developed in a similar fashion, but first we have to apply Property A.15 in order to bring the small oh symbol outside of the integral (the cited Property was formulated for big Ohs, but it’s valid also for small ohs):

where in the last passage we applied Property A.8A, which was formulated for big Ohs but it’s true also for small ohs.

Substituting (b) and (c) into (a), we’ll have that:

hence:

Formula (7) follows from the equality x o(\log^2 x) = o(x \log^2 x), consequence of Corollary 1 of Property A.8 in the case of small ohs.

Ultimately, joining the formulas (4), (5), (6) and (7), we’ll obtain:

Substituting into (1), we’ll obtain:

and finally, by simplifying asymptotic orders:

Since x \log x = o(x \log^2 x), we’ll have that O(x \log x) = O(o(x \log^2 x)) = o(x \log^2 x), where in the last equality we applied Property A.16. So o(x \log^2 x) + O(x \log x) = o(x \log^2 x) + o(x \log^2 x) = o(x \log^2 x), where in the last passage we applied Corollary of Property A.9, which can be proved similarly for small ohs.

The last equality can be developed as follows:

- In the passage (*) we divided all by \log^2 x, assuming that x \gt 1 and applying Property A.8A concerning the small oh;
- In the passage (**) we assumed that, if a sequence is lower than another one, also its limit superior is less than or equal to the limit superior of the other sequence. We’ll not prove this property, but we think it’s rather intuitive.
- In the passage (***) we considered that the limit of a constant is the constant itself and that by definition the limit of a o(1) is zero, i.e. \lim_{x \to +\infty} \beta^{\prime} = \beta^{\prime} and \lim_{x \to +\infty} o(1) = 0, hence \lim_{x \to +\infty} \left( \beta^{\prime} + o(1) \right) = \beta^{\prime}; so, by Property A.3, also \limsup_{x \to +\infty} \left( \beta^{\prime} + o(1) \right) = \beta^{\prime}.

## Final remarks

We’ll conclude this post with a graphical representation of the relationship \alpha \leq \beta^{\prime} that we have just proved. We know that it’s a relationship between two limit superiors, because \alpha is defined as the limit superior of the function |V(\log x)| and \beta^{\prime} is defined as the limit superior of the function \frac{1}{\log \xi} \int_0^{\log \xi} |V(u)|\ du (the first function defined on positive integers, the second one on real numbers greater or equal than 1). The two functions are shown in the following picture:

From the picture it’s clear that the function |V(\log x)| tends to zero, hence \limsup_{x \to +\infty} |V(\log x)| = \alpha = 0. This relationship, remembering Lemma N.7, substantially coincides with the statement of the prime number Theorem. Considering the integer values x = \xi = 1, 2, 3, \ldots, the relationship |V(\log x)| \leq \frac{1}{\log \xi} \int_0^{\log \xi} |V(u)|\ du is satisfied for all punctual values of x and \xi shown in the graph, but we have to consider that the descent of the function \frac{1}{\log \xi} \int_0^{\log \xi} |V(u)|\ du is very slow, because \log \xi increases very slowly as \xi increases, hence we cannot exclude that there are some xes such that |V(\log x)| \gt \frac{1}{\log x} \int_0^{\log x} |V(u)|\ du. However, even if such values of x existed, that would not be in contradiction with the inequality \alpha \leq \beta^{\prime}; for example, the sequence shown in Figure 3 of the post The limit inferior and the limit superior of a sequence has limit superior 1 even if infinitely many values of it are greater than 1 (but they are closer and closer to it as we move towards the right in the graph).