Prerequirements:

- The Prime Number Theorem: history and statement
- Chebyshev’s Theorem (strong version)
- From real numbers to integer numbers
- Chebyshev’s Theorem (weak version)

In the previous post we saw that the only difference between the statement of the strong version of Chebyshev’s Theorem and the one of the Prime Number Theorem, is that the first theorem does not assert the existence of the limit

whereas the second one does. In fact, it’s not by chance that the general idea of the Prime Number Theorem proof consists in starting from the proof of Chebyshev’s Theorem (strong version), reusing the main parts of it in order to prove the remaining part of the statement, that is the existence of the limit (1). So we can see Chebyshev’s Theorem as an intermediate leg, a source of inspiration for achieving the Prime Number Theorem.

Reading again the post about the strong version of Chebyshev’s Theorem, we can see that the key passage — the one that let us say that the limit (1), if it exists, is 1 — is the following equation:

This will be our starting point. Our goal will be to understand something more about the function \psi (and about its simple extension \overline{\psi}), finally proving that:

In fact, if we prove this, we can get to the Prime Number Theorem in a single passage, given that, by Corollary of Theorem N.5 (Asimptotic equivalence between \pi(x) and \frac{\psi(x)}{\log x}), \pi(x) \sim \frac{\psi(x)}{\log x}, and so by transitivity we’ll have:

Indeed instead of proving (3) we’ll prove a stronger version of it, in which both the function \psi and the identity function of \mathbb{N}^{\star} (\mathrm{id}_{\mathbb{N}^{\star}}, defined by \mathrm{id}_{\mathbb{N}^{\star}}(x) := x), have been extended to real numbers, obtaining respectively the functions \overline{\psi} and \mathrm{id}_{[1, +\infty)}:

The equivalence (3′) implies (3), hence, after the first one has been proved, the second one turns out to be automatically proved too. Hereinafter we’ll prove (3′), by using the tools of real analysis.

The equivalence (3′) implies (3) due to a general principle: if two real functions defined on an interval I \subseteq \mathbb{R} are asymptotically equivalent, also their restrictions to any upper unbounded subset of I are asymptotically equivalent. In this case the two functions are \overline{\psi}: [1, +\infty) \rightarrow \mathbb{R} and the identity function of [1, +\infty); if they are asymptotically equivalent, then also their restrictions on \mathbb{N}^{\star}, that is an upper unbounded subset of [1, +\infty), are asymptotically equivalent. These restrictions are respectively \psi: \mathbb{N}^{\star} \rightarrow \mathbb{R} and the identity function of \mathbb{N}^{\star}. The fact that \mathbb{N}^{\star} is an upper unbounded subset of [1, +\infty), is an essential requirement in order to compute the limit for x \to +\infty, which is implicit in the asymptotic relationships (3) and (3′). So, substantially, in asymptotic relationships it’s not important what points x (integer) or t (real) are considered in the limit computation; the important thing is that the set of these points is upper unbounded, in order to let the limit computation possible when the value of the variable (either x or t) tends to infinity.

Now that the starting point and the goal are clear, we can actually start with the proof. It will be long and complex, but we’ll try to highlight the key passages, for avoiding to lose the thread. Now we have to remember formulas (2) e (3′) that are respectively the initial one and the final one; in the next posts we’ll see what other important formulas there are in the middle.

The first passage of the proof, which is the subject of this post, is a technical artifice: it consists in rewriting (2) in a simpler form, eliminating the square at the denominator and the logarithm. By appropriate passages we can get to the following form:

Starting from formula (2), first of all we can note that \log x = \int_1^x \frac{1}{t} dt; in fact, calculating this integral, we’ll obtain that \int_1^x \frac{1}{t} dt = [\log t]_1^x = \log x - \log 1 = \log x - 0 = \log x. By rewriting the logarithm this way and using the properties of the algebraic sum of integrals, we’ll obtain:

Now, in order to eliminate the factor \frac{1}{t} inside the integral, it’s possible to use the substitution t := e^u, hence dt = e^u du. As a result of this substitution, the integral ends change because, if t = 1, then e^u = 1, and so u = 0, that becomes the new lower end; similarly, if t = x then e^u = x and so u = \log x, that becomes the new upper end. With these substitutions, we can obtain formula (4):

The integrand function has a very simple structure, being expressible as the composition of the function \frac{\overline{\psi}(t) - t}{t}, with the exponential function. From now on, we’ll call the integrand function V and its first component function W:

Functions W and V

We define the following functions W: [1, +\infty) \rightarrow \mathbb{R} and V: \mathbb{R}_{+} \rightarrow \mathbb{R}:

From Definition N.13 it follows immediately that

for all u \in \mathbb{R}^{+}.

The graphs of the functions W and V are shown respectively in Figures 1 and 2.

In the final part of the proof we’ll study these graphs in depth; for now we’ll limit ourselves to note two simple things:

- The graph of W has some discontinuities in correspondence with the integers for which the function \overline{\psi} changes value. For example, for t=1, \ldots, 6 we’ll have that \overline{\psi}(t) = \psi(t) = \log 1, \log 2, \log 6, \log 12, \log 60, \log 60, so \overline{\psi} changes value in the passage from 1 to 2, from 2 to 3, from 3 to 4, from 4 to 5, but it remains constant when passing from 5 to 6. In the graph (Figure 1), you can see this because there are discontinuities in the points 2, 3, 4, and 5, but not in the point 6. In fact the points 5 and 6 graphically lay on the same continuous segment.
- The graph of V between two integers n and n + 1 reproduce the graph of W between e^n and e^{n + 1}. Figure 3 shows this for n = 1.

Definition N.13 lets us rewrite (4) like this:

Now let’s analyze the functions W and V in a greater depth.

The function W computes a relative error. In fact the numerator \overline{\psi}(t) - t expresses the difference between the number \overline{\psi}(t) that we have to estimate, and the asymptotic estimate t we want to obtain, by (3′). The function W does not express this difference in absolute terms, but relative to the value of t. This is a general principle which is applied when a relative error is computed. For example, making an error of a unity in one thousand is different from making an error of a unity in one million: the relative error is one thousandth in the first case, one millionth in the second case.

So the relative error is a function that indicates rather intuitively the similarity between two values, in this case \overline{\psi}(t) and t. By (3′), we have to prove that there is a certain similarity between these two functions (that is the asymptotic equivalence) and so, intuitively, we can expect the relative error W(t) = \frac{\overline{\psi}(t) - t}{t} to tend to zero. Indeed, the two things are equivalent:

So we rephrased further our goal, that from (3′) can be rewritten equivalently like this:

But we can still make a further step, considering the function V and noting that:

If W(t) tends to zero as t tends to infinity, as values of t we can consider in particular the ones of the kind t = e^u with u that also tends to infinity. In fact, if t \to +\infty then also u = \log t \to +\infty, because the function \log t is increasing with respect to t. So, since the limit of a function does not depend on the particular values chosen for the variable (provided that they tend to infinity), being V(u) = W(e^u) = W(t) and having proved that t and u both tend to infinity, if \lim_{t \to +\infty} W(t) = 0 it must be also that \lim_{u \to +\infty} V(u) = 0.

Conversely, if V(u) \to 0 as u \to +\infty, we can set t := e^u, hence u = \log t and so V(u) = W(e^{\log t}) = W(t). Now, if u \to +\infty, also t = e^u \to +\infty, because the function e^t is increasing with respect to t. So, reasoning like above, being W(t) = V(u) and since t and u both tend to infinity, if \lim_{u \to +\infty} V(u) = 0 then it must be also that \lim_{t \to +\infty} W(t) = 0.

Thus we have obtained a further reformulation of our goal:

Summarizing, we have the following chain of implications (that are actually double implications) that bring us towards the Prime Number Theorem:

So we reduced the proof of the Prime Number Theorem to the problem of proving that \lim_{u \to +\infty} V(u) = 0. Careful, because in this statement it’s implied that the limit exists, besides being 0; the limit existence is always part of the thesis to be proved. So, how to prove that the limit of a function exists and is 0?

If instead of a function we had a sequence, the most immediate method would be applying Proposition A.2 (*Every real sequence having \liminf equal to \limsup has a limit*). Now, it can be proved that this result can be extended to functions too, hence in our case:

That is, for proving that the limit of V exists and is 0, it should be proved that the limit inferior and the limit superior of the function are equal to zero too. The advantage of this technique is that the limit superior and the limit inferior, unlike the limit, always exist by definition; they just have to be computed. So (5) lets us establishing the numerical value and the existence of a single object (the limit of V) on the basis of the numerical value of two distinct objects (its limit superior and inferior), of which however we have not to care about the existence. We can say that it’s a good simplification. The drawback is that now we have two equations that we should study separately: \limsup_{u \to +\infty} V(u) = 0 and \liminf_{u \to +\infty} V(u) = 0. In principle, after having proved one of the two, the proof should be restarted from the beginning in order to prove also the other one. But here it’s possible to simplify again, joining the two equations into a single one. In fact, it can be proved that:

>In this detail we’ll denote for the sake of brevity \limsup_{u \to +\infty} V(u) and \liminf_{u \to +\infty} V(u) simply as \limsup V(u) and \liminf V(u). Let’s start to prove the implication towards right: \limsup V(u) = \liminf V(u) = 0 \Rightarrow \limsup |V(u)| = 0.

Getting back to formula (5), we have that, if \limsup V(u) = \liminf V(u) = 0, than the function V tends to zero. But if V tends to zero, its opposite -V tends to zero too (by the definition of limit, or because \lim -V = -\lim V = -0 = 0), so its absolute value tends also to zero, by the so called “Squeeze Theorem“, being always -V(u) \leq |V(u)| \leq V(u). Summarizing, 0 = \lim V(u) = \lim -V(u) = \lim |V(u)|. But if \lim |V(u)| = 0, then, by the aforementioned Proposition A.2, \limsup |V(u)| = \liminf |V(u)| = 0; in particular \limsup |V(u)| = 0, that is what we wanted to prove.

Conversely, if \limsup |V(u)| = 0, being always V(u) \leq |V(u)|, by the definition of limit superior (and as it’s easy to realize), we’ll have that \limsup V(u) \leq \limsup |V(u)|, hence:

We can note that, being |V(u)| = |-V(u)|, we can put -V in place of V in all the previous passages, up to formula (7), obtaining:

Now we can employ a general property that connects the limit superior and the limit inferior, which in our case translates into the equality \limsup -V(u) = -\liminf V(u), from which, by (7′), -\liminf V(u) \leq 0, hence:

Joining (7) and (8), we’ll have that \limsup V(u) \leq 0 \leq \liminf V(u), so \limsup V(u) \leq \liminf V(u). But we know that the inverse relationship is generally true, that is \liminf V(u) \leq \limsup V(u); so in this case, being both relationships true, we must have that \liminf V(u) = \limsup V(u). In addition, again by (7) and (8), we can extend this inequality, obtaining 0 \leq \liminf V(u) = \limsup V(u) \leq 0, and so \liminf V(u) = \limsup V(u) = 0, that is what we had to prove.

So, provided that the absolute value has been introduced, it’s possible to come back to a single equation, like when there was the limit, but with the advantage of the \limsup, that consists in existing always. Of course, the absolute value is not easy to work with, but we’ll be able to develop most of the proof without it, and we’ll introduce it only when necessary.

Now, since we made several backward steps in order to obtain simpler and simpler formulas, it’s worth stopping for a while for summarizing, travelling through the passages again onwards. Concerning our goal, in order to achieve the Prime Number Theorem formula we travelled through the following chain of implications (which are indeed double implications, but this is not important for the proof):

This chain of implications proves the following Lemma, according to which in the following posts we’ll be allowed to forget the passages above, in order to focus our attention on the proof of the formula \limsup_{u \to +\infty} |V(u)| = 0:

Sufficient condition for the validity of the Prime Number Theorem

The following is a sufficient condition for the validity of the Prime Number Theorem:

where the function V is defined in Definition N.13.

Our starting point was also the object of some backward passages, which let us introduce the function V and can be summarized as follows:

So, substantially, if we prove the implication:

we would prove the Prime Number Theorem. In fact, the previous passages and formula (2) prove the left part, while Lemma N.7 guarantees the sufficiency of the right part; what’s missing is to prove the implication.

In the next post we’ll see how, starting from (9), is possible to sketch out the continuation of the proof.