Prerequirements:

- The second part of the prime number Theorem proof: the basic ideas
- End of the first part of the proof: the relationship between α and β’
- The functions W and V
- Properties of asymptotic orders

In this post we’ll complete the proof of the prime number Theorem, applying the fundamental ideas described in the previous post. We’ll split this last part of the proof into five parts:

- Our starting point will be a simple overestimation of the integral of |V| in any interval, obtained by applying Property A.18;
- We’ll improve the result of the previous point supposing that within the interval the function V has at least one zero;
- We’ll improve the result of the first point also for the intervals in which the function V has no zeroes;
- We’ll compute how long the width \delta of a generic interval should be, in order that it would be possible to apply, depending on the situation, either of the two previous overestimations;
- Finally, we’ll apply the previous results to the interval [0, \log \xi] and we’ll obtain the final result by letting \xi tend to +\infty.

## Overestimation by limit superior

As we saw in the previous post, the main problem of the second part of the proof of the prime number Theorem consists in overestimating the integral of |V| in the interval [0, \log \xi] (where \xi is the variable used in the computation of limit superiors, for example in the definition of \alpha^{\prime} and \beta^{\prime}). As we saw in the second idea, in order to do such an operation, it’s useful to divide the interval into smaller intervals. So, for the moment we’ll abstract from the original interval (which is not fixed, but it depends on \xi) and we’ll suppose to start from a generic interval [a, a + \delta], of course with \delta \gt 0 and a \geq 0. We’ll try to find a value for \delta which will let us make a good overestimation and we’ll see that such an overestimation will depend only on \delta, not on a.

A simple way to overestimate the integral of |V| in the interval [a, a + \delta] consists in applying Property A.18, according to which:

But the term inside the parentheses is by definition \alpha^{\prime}, so:

Applying this inequality to the integral, we’ll have that:

The last passage can be explained as follows:

where in the passage (*) we applied Property A.18 and in the last passage we applied Corollary 1 of Property A.8.

Below we’ll call (1) “overestimation by limit superior”. It’s not a sophisticated overestimation, because it’s based on a property which is valid generally for real functions (Property A.18), not specific for the function |V|. In the following sections we’ll see how, using instead some properties of the function |V|, we’ll be able to obtain a better estimate.

## Intervals in which V has a zero

Let’s suppose that in the interval [a, a + \delta] the function V has at least one zero (it’s a possible case because the function V has some zeroes, as it’s quite clear from Figure 2 of the previous post). If we indicate this zero with t, by Proposition N.25 we’ll have that, for all h \geq 0:

So, by the third idea of the previous post, we can use this inequality for overestimate the integral of the function |V| between t and t + v, where v is an arbitrary positive real number:

After some passages, we can prove that:

First of all, let’s split the integral in two parts:

The first part can be developed as follows:

About the second part, we have to consider that the function which is the argument of the big Oh has t as variable, so it’s constant with respect to the integration variable, which is h. So we can bring the asymptotic order outside of the integral, obtaining that \int_0^v O\left(\frac{1}{t}\right)\ dh = O\left(\frac{1}{t}\right) \int_0^v 1\ dh. Now, the integral \int_0^v 1\ dh is equal to v, which is constant with respect to t, so as an asymptotic order it’s O(1) by Corollary of Property A.7. So we have a O\left(\frac{1}{t}\right) which multiplies O(1), producing, by Property A.8, another O\left(\frac{1}{t}\right). But this asymptotic order is a o(1) by Property A.16, because \frac{1}{t} = o(1) by Definition A.4. Summarizing:

Substituting (b) and (c) into (a) we’ll have:

Thus we have proved the first passage. The second one is a consequence of the inequality v + \frac{1}{e^v} - 1 \leq \frac{v^2}{2}, which we’ll prove now. First of all we can note that this inequality is true for v \geq 2. In this case, in fact, \frac{1}{e^v} \lt 1, hence v + \frac{1}{e^v} - 1 \leq v; moreover v \leq \frac{v^2}{2}, because \frac{v^2}{2} - v = \frac{v^2 - 2v}{2} = \frac{v(v - 2)}{2}, which is a non-negative quantity if v \geq 2. So let’s see what happens for v \lt 2. For this aim, let’s recall the Maclaurin series of the exponential function:

from which:

We have to remember that, by definition of series, the right member is the limit of the sequence of the following partial sums (for simplicity we’ll use 2 as initial index):

The limit of this sequence, by (d), exists and it’s equal to v + \frac{1}{e^v} - 1.

Now let’s consider the subsequence b_j obtained by taking only the terms with an even index:

Let’s analyze the differences between each term and the next one:

Now, for v \lt 2, all the factors 1 - \frac{v}{4}, 1 - \frac{v}{6}, \ldots, 1 - \frac{v}{2j}, \ldots are positive; so, since also v \gt 0, all the differences b_2 - b_3, b_3 - b_4, \ldots, b_j - b_{j + 1}, \ldots are positive too. This means that for v \lt 2 the sequence (b_j) is strictly decreasing. So all the terms of the sequence are overestimated by the first one, and this property can be extended also to the limit:

But the sequence (b_j) has been extracted from (a_i), so it must have the same limit, which is equal to the sum of the series (d):

Joining formulas (d), (e) and (f), we’ll obtain finally:

Joining the two previous formulas, we’ll have:

In this inequality there is the asymptotic term o(1), which for the moment we can think as an error term, negligible because it tends to zero. We’ll carry on this error until the end, when we’ll eliminate it thanks to the limit operation.

Our goal is making the integral of (2) as small as possible. So, reasoning in the same way as we did for formulas (3) and (7) of the previous post, we should have that:

where \epsilon is a fixed positive real number, as small as possible. In order to let the last inequality be true, by (2) it must be that:

So, for example, v may be just equal to \epsilon. With this choice we’ll obtain, substituting into (2):

Thus, we have overestimated the integral of the function in an interval of length \epsilon, starting from a zero of V, with the area of a right triangle having both height and basis \epsilon, or the area of a rectangle having basis \epsilon and height \frac{\epsilon}{2}:

Till now we overestimated the integral of |V| in the interval [t, t + \epsilon], where t is a zero of V, but we wanted an overestimation in the initial interval [a, a + \delta]. We can achieve this goal by making two suppositions:

- We’ll suppose that \alpha^{\prime} \gt 0 and we’ll set \epsilon := \alpha^{\prime}
- We’ll suppose that [t, t + \alpha^{\prime}] \subseteq [a, a + \delta], that is a \leq t \lt t + \alpha^{\prime} \leq a + \delta

The second supposition is equivalent to the couple of inequalities a \leq t and t + \alpha^{\prime} \leq a + \delta, hence a \leq t \leq a + \delta - \alpha^{\prime}. So the zero t not only must belong to the interval [a, a + \delta], as we supposed initially, but in particular it must belong to the sub-interval [a, a + \delta - \alpha^{\prime}]. So the second supposition makes the initial hypothesis redundant (if V has at least one zero in [a, a + \delta - \alpha^{\prime}], then it has at least one zero in [a, a + \delta]). Clearly V may have other zeroes in [a + \delta - \alpha^{\prime}, a + \delta], but we don’t need to know that; the important thing is that it has one in [a, a + \delta - \alpha^{\prime}].

Another consequence of the second supposition is that \delta \geq \alpha^{\prime}. In fact, if the interval [t, t + \alpha^{\prime}] is contained in the interval [a, a + \delta], then the length of the latter, that is \delta, must be at least equal to the length of the former, that is \alpha^{\prime}.

With these suppositions, reasoning according to the fourth idea of the previous post, we can extend the overestimation of the integral from the interval [t, t + \alpha^{\prime}] to the interval [a, a + \delta]. Applying the overestimation by limit superior and formula (4) with \epsilon := \alpha^{\prime}, we can prove the following Lemma:

Overestimation of the integral of |V| better than the one by limit superior, first case

Suppose that \alpha^{\prime} \gt 0. Let \delta and a be such that \delta \geq \alpha^{\prime} and a \geq 0. If in the interval [a, a + \delta - \alpha^{\prime}] the function |V| has at least one zero, and if there is a constant h \lt 1 such that:

then:

Developing the left term of the inequality, we’ll obtain:

where in the passage (*) we applied the overestimation by limit superior in the intervals [a, a + t] and [t + \alpha^{\prime}, a + \delta], while in the interval [t, t + \alpha^{\prime}] we applied (4) with \epsilon := \alpha^{\prime}.

We can note that:

Joining the two previous formulas (g) and (h) we’ll obtain:

With respect to (1), here there is the “less than” symbol instead of the “less than or equal” one, so this overestimation is indeed better. However, we cannot be happy with this slight difference. In fact at the end we’ll pass to the limit but, as it’s well known in analysis, if f \lt g then \lim f \leq \lim g (of course under proper hypotheses which give sense to notations): the limit transforms \lt into \leq, so we would go back to the starting point. But the solution is simple. Looking closely at formula (h), we can note that the “less than” sign does not actually regard the whole expression, but only the part without the asymptotic order:

Equivalently, we can say that there is a constant h \lt 1 such that:

The proof is now finished, because (5) is the union of (g) and (j). Before getting this over with, though, let’s see the relationship that is obtained by eliminating the intermediate passage:

This relationship is preferable to (i), because, changing it into a limit, the symbol \leq is kept, as well as the constant \delta h \alpha^{\prime}, but \delta h \alpha^{\prime} \lt \delta \alpha^{\prime}; so, thanks to this technique, we’ll be able to keep the “less than” sign also after the passage to the limit. We’ll see that in detail in the last section.

Now we have overestimated the integral of the function in [a, a + \delta] with the area of a rectangle with basis \delta and height h \alpha^{\prime}:

We can note that h \alpha^{\prime} \lt \alpha^{\prime}, so this overestimation is better than the one by limit superior given by formula (1): in the latter case the height of the rectangle would have been \alpha^{\prime}.

## Intervals in which V has no zeroes

Now, while keeping the supposition that \delta \geq \alpha^{\prime} \gt 0, let’s analyse the complementary case with respect to the previous one: let’s suppose that V has no zeroes in [a, a + \delta - \alpha^{\prime}]. In order to compare the two cases better, we’ll choose again a point t such that [t, t + \alpha^{\prime}] \subseteq [a, a + \delta]. Clearly this time t is not a zero of V, but it can be an arbitrary point. We’ll choose t in such a way to make the left ends of the intervals [t, t + \alpha^{\prime}] and [a, a + \delta] coincide, that is t + \alpha^{\prime} = a + \delta. This way the interval [a, a + \delta] turns out to be divided into two parts (if we had not make the interval ends coincide, the parts would have been three and calculations would have been a bit longer):

In the interval [t, t + \alpha^{\prime}] we can apply the overestimation by limit superior. Substituting \delta := \alpha^{\prime} into (1), we’ll obtain:

In the previous case, when t was a zero, we obtained a better approximation between t and t + \alpha^{\prime} (formula (4) with \epsilon := \alpha^{\prime}); in this case, though, we can improve the overestimation of the integral in the interval [a, t]. Before for this interval we applied the overestimation by limit superior, whereas now we can reason in another way, because we know that V has no zeroes in [a, t] (because by hypothesis it has no zeroes in [a, a + \delta] and [a, t] \subseteq [a, a + \delta]). In particular we can apply the following Lemma:

In an interval, if V has no zeroes it changes sign at most once

Let I be a real interval in which V has no zeroes. Then V changes sign at most once in I.

Reasoning by contradiction, we’ll suppose that V changes sign at least two times in I: we have to prove that V has at least one zero in I, thus arriving to a contradiction with the hypothesis.

If the function changes sign at least twice, then there are four points a, b, c, d \in I such that a \lt b \lt c \lt d and one of the following possibilities happens:

- case 1: V \geq 0 in [a, b], V \leq 0 in [b, c], V \geq 0 in [c, d]
- case 2: V \leq 0 in [a, b], V \geq 0 in [b, c], V \leq 0 in [c, d]

In both cases, there are three points e, f, g \in I such that e \lt f \lt g and V \geq 0 in [e, f], V \leq 0 in [f, g] (in the first case [e, f] = [a, b] and [f, g] = [b, c], in the second case [e, f] = [b, c] and [f, g] = [c, d]). This means that the function V is continuous at f. Let’s see why. If we look at the function graph (Figure 2 of the post The functions W and V), we can note that the discontinuity points of V correspond to jumps from lower values to higher values, never the converse. Let’s take this property for granted for now, we’ll prove it later. As a consequence we can say that, if f was a discontinuity points of V, then the values assumed by V in a right neighbourhood of f would be greater than those ones assumed in a left neighbourhood. But this is in contrast with the property that V is positive or zero at the left of f and negative or zero at the right (rather, it should be the converse); so V must be continuous in f.

Now, by definition of continuity, both the right limit and the left limit of V(u) for u \to f exist and are equal to V(f). But the left limit must be greater or equal to zero, because V \geq 0 at the left of f; similarly the right limit must be less than or equal to zero, because also V is such at the right of f. So the only possibility, by the continuity of V in f, is that \lim_{u \to f}^+ V(u) = \lim_{u \to f}^- V(u) = 0 = V(f). Then V has a zero in I (and this zero is f).

In order to complete the proof, we have to prove the property we took for granted relying on the function graph, i.e. that the discontinuity points of V correspond to jumps from lower values to greater ones. This property can be proved starting from the remark after the definition of V, according to which for all u \in \mathbb{R}^+:

hence

Clearly the discontinuity points of this function come from those ones of \overline{\psi}, because all the other functions involved are continuous. But \psi is an increasing function, so this is true also for its simple extension \overline{\psi}. So, if h is a discontinuity point of \overline{\psi}, this function in that point will make a jump from lower values to greater ones. More formally, the value assumed by the function at h will be greater than those ones assumed in a left neighbourhood of it. Then, indicating with I_s such a left neighbourhood, there is a constant \delta \gt 0 such that:

hence

Now, letting k tend to h, at some point the values of \overline{\psi}(k) will be constant, because we’ll be at one of the constant segments of the simple extension, before the jump at h. So for a left neighbourhood I_s^{\prime} \subseteq I_s and for a constant C, considering that k \leq h, we’ll have:

So, since \delta \gt 0:

hence, by (k), we’ll have:

so the function, at \log h, makes a jump from smaller values to greater ones. But, as we noted before, the discontinuity points of V come from the ones of \overline{\psi}: in particular \log h is a discontinuity point of V if and only if h is a discontinuity point of \overline{\psi}. So our argument is valid for all the discontinuity points of V.

So we can be certain that V changes sign at most once in the interval [a, t]. Then there are only four possibilities:

- V is always positive in [a, t]
- V is always negative in [a, t]
- V is always positive in [a, m] and always negative in [m, t], where m is such that a \lt m \lt t
- V is always negative in [a, m] and always positive in [m, t], where m is such that a \lt m \lt t

In the first two cases, by the first and the second idea of the previous post, in particular by Lemma N.14, we can overestimate the integral of |V| as follows:

where A is a positive constant (which we don’t need to know the value of).

In the other two cases we can repeat the same argument in the two intervals [a, m] and [m, t], obtaining:

But, since 2A \lt 4A, we can unify the two previous formulas into a single formula which is true for all cases:

So, by (6), (7) and (8), we’ll have that:

Now let’s ask ourselves: is this approximation good? Is it better than (1), which is the one we would obtain by means of the overestimation by limit superior? If it was so, the following relationship would be true:

Equivalently, using the notation of Lemma N.15, for the motivations explained in its proof, we can say that:

Of course, if \delta is fixed, this relationship can be true or false depending on the value of A, which we don’t know. So, it’s convenient to reason in the opposite way: since the value of A is fixed, we can find a value of \delta in order that (10) is true. Thus, for all the intervals of such length, the overestimation (9) would be better than (1). This way we have obtained the following Lemma:

Overestimation of the integral of |V| better than the one by limit superior, second case

Suppose that \alpha^{\prime} \gt 0. Let A be the constant of Lemma N.14, let \delta \geq \alpha^{\prime} and a \geq 0. If in the interval [a, a + \delta - \alpha^{\prime}] the function |V| has no zeroes, and if there is a constant k \lt 1 such that

then:

## Calculation of the interval length

Up to now we found two ways for obtaining an overestimation of the integral of |V| in the interval [a, a + \delta]: first when V has at least one zero in the interval, then when it has no zeroes. In both cases we found an approximation better than the one by limit superior. The problem is that in each case we set some constraints on \delta. If we want to make a general argument, we have to make sure that there exists a value of \delta which satisfies the constraints of both cases: making so, for any interval of length \delta, we would be sure to have always a better approximation than the one by limit superior, no matter of which of the two cases occurs.

First of all, in both the previous cases we supposed that \delta \geq \alpha^{\prime}. Moreover, in each case we made a different additional hypothesis:

- In the first case we supposed that there is a constant h \lt 1 such that (\delta - \alpha^{\prime}) \alpha^{\prime} + \frac{(\alpha^{\prime})^2}{2} = \delta h \alpha^{\prime}
- In the second case we supposed that there is a constant k \lt 1 such that 4A + (\alpha^{\prime})^2 = \delta k \alpha^{\prime}

So we can say that a number \delta satisfies the hypotheses of both cases, i.e. both of Lemma N.15 and Lemma N.17, if it’s a solution of the system:

After some algebraic calculations, we can obtain the following class of solutions (there are more, but we are not interested in finding them all):

Since we don’t need to find all the solutions, but it’s sufficient to find even one of them, we can suppose that k \leq h. We’ll see what solutions there are in this case and we’ll ignore the ones which can be obtained when k \gt h:

Now the second to last condition has become redundant, being a consequence of the previous two conditions (if k \leq h and h \lt 1, then k \lt 1):

Now if we apply the last inequality to the right member of the second one, we’ll obtain \alpha^{\prime} k \delta \leq \alpha^{\prime} h \delta; but, after that, the last inequality becomes redundant, because it can be obtained just from \alpha^{\prime} k \delta \leq \alpha^{\prime} h \delta:

Now the second row is at the same time an equality and an inequality. Let’s separate the equality part by extracting k:

Thus we have got rid of k and we can focus on the other two variables, h and \delta.

Dividing the first equation by \alpha^{\prime}, we’ll obtain (\delta - \alpha^{\prime}) + \frac{\alpha^{\prime}}{2} = h \delta, hence \delta - \frac{\alpha^{\prime}}{2} = h \delta and h = 1 - \frac{\alpha^{\prime}}{2 \delta} (the last passage assumes that \delta \neq 0, which is true because one of the inequalities of the system is \delta \geq \alpha^{\prime} and by hypothesis \alpha^{\prime} \gt 0). This value of h makes the fourth condition redundant because, being \delta \geq \alpha^{\prime} \gt 0, 1 - \frac{\alpha^{\prime}}{2 \delta} \lt 1. So we’ll obtain:

Substituting the value of h in the second formula of the system, we’ll obtain 4A + (\alpha^{\prime})^2 \leq \alpha^{\prime} \left( 1 - \frac{\alpha^{\prime}}{2 \delta} \right) \delta, hence 4A + (\alpha^{\prime})^2 \leq \alpha^{\prime} \delta - \frac{(\alpha^{\prime})^2}{2} \Rightarrow 4A + \frac{3 (\alpha^{\prime})^2}{2} \leq \alpha^{\prime} \delta \Rightarrow \delta \geq \frac{4A}{\alpha^{\prime}} + \frac{3 \alpha^{\prime}}{2}. Clearly this value is greater than \alpha^{\prime}, so the third condition is redundant. Thus we’ll obtain the final solution:

So it’s sufficient to fix a value for \delta which is greater or equal to \frac{4A}{\alpha^{\prime}} + \frac{3 \alpha^{\prime}}{2} and putting it into the equations of (11), in order to obtain the corresponding values for h and k and so a particular solution of the system. Such a solution still satisfies the hypotheses of both Lemma N.15 and Lemma N.17, so:

hence, since k \leq h:

Summarizing, we can state the following Proposition:

Overestimation of the integral of |V| better than the one by limit superior, in intervals of appropriate length

Suppose that \alpha^{\prime} \gt 0. Let \delta \geq \frac{4A}{\alpha^{\prime}} + \frac{3 \alpha^{\prime}}{2}, where A is the constant of Lemma N.14. Then there exists a constant h \lt 1 such that, for all a \geq 0:

## End of the proof

Now let’s go back to the initial interval [0, \log \xi]. We can note that, if \log \xi \geq \frac{4A}{\alpha^{\prime}} + \frac{3 \alpha^{\prime}}{2}, we could apply Proposition N.27 directly with \delta := \log \xi and a := 0. So, supposing that \alpha^{\prime} \gt 0, there exists h \lt 1 such that:

hence, since \log \xi \gt 0 and by Corollary 1 of Property A.8:

Now let’s pass to the limit superior:

Formula (12) is not true for all \xi but, as we supposed before, it is true only if \log \xi \geq \frac{4A}{\alpha^{\prime}} + \frac{3 \alpha^{\prime}}{2}. However, we are not interested in what happens for values of \log \xi smaller than \frac{4A}{\alpha^{\prime}} + \frac{3 \alpha^{\prime}}{2}, because we want to know what happens when \xi \to +\infty: the limit superior (as well as limit) for \xi \to +\infty is not affected by the behaviour of the involved functions for values of \xi smaller than a fixed value. So formula (13) is always satisfied, because \xi has changed meaning from (12) to (13): from being a fixed real number, it has become the variable of the limit superior.

The left member of (13) by definition is equal to \beta^{\prime} (see Definition N.23), while the right member is equal to h \alpha^{\prime} (because h and \alpha^{\prime} do not depend on \xi, and o\left( \frac{1}{\log \xi} \right) tends to zero). So:

But, since h \lt 1:

But this relationship is in contrast with Proposition N.26. So our assumption that \alpha^{\prime} \gt 0, the only arbitrary assumption that has remained at now, must be wrong. Then, remembering that \alpha^{\prime} \geq 0, we can say that:

By Lemma N.7, this equation is a sufficient condition in order that the prime number Theorem is true, so the proof is finished.