Prerequisites:

The goal we establish here is to prove the Bertrand’s postulate, proposed in 1845 by the French mathematician Joseph Louis François Bertrand as a conjecture, and later proved by the Russian Pafnutij L’vovič Čebyšëv:

Bertrand’s postulate

*For all n > 0, there exists a prime number between n + 1 and 2n.*

The statement of the theorem can be illustrated as follows:

We have already told about the prime numbers between n + 1 and 2n, subject of the theorem, when we analyzed the product of the first prime numbers. In that occasion we began with the example of the product of primes between 11 and 20: we saw that an overestimation of this product is the binomial \binom{20}{10}; in fact in this expression the numbers at the denominator are cancelled with several factors at the numerator, leaving finally the product we are interested in, multiplied by some additional factors:

Then we developed this technique, until we obtained an estimation of the product of all primes up to a certain number x. Now we start from the same point but, analyzing the matter in a different way, we will get to the prove of the Bertrand’s postulate.

By generalizing (1), we note that we started from the binomial \binom{2n}{n} and we obtained, finally, the product of the prime numbers between n + 1 and 2n, multiplied by some other smaller factor. Bertrand’s postulate states that there exists at least one prime number between n + 1 and 2n: that means that in the final passage, after the cancellation of passage (2), we have to find at least one prime factor greater than n. But, getting back to (1), we see that the final passage is none other than the decomposition into prime factors of the starting number, the binomial \binom{2n}{n}: so we are saying that this number, that henceforth we’ll call N, has at least one prime factor greater than n. Therefore we can see the Bertrand’s postulate as a theorem about the factorization of the number N. Since we are interested in what prime factors appear in this factorization, and what do not, we can write passage (3) by introducing, with zero exponent, also the missing prime factors:

At this point we can call k_p the exponent of the prime factor p in this particulat factorization, in which there are all primes up to 2n, that is:

So, riformulating the Bertrand’s postulate, we can say that at least one prime q greater than n has esponent k_q > 0.

Let’s analyze in detail this number k_p: what can we say about it, given p? With an interesting, but rather involved, reasoning, we can prove the following Lemma, that establishes an upper threshold:

Factorization of the binomial \binom{2n}{n}

*In the factorization of the binomial \binom{2n}{n}, for n > 0, a prime number p cannot have an exponent greater than \log_p 2n.*

Let’s start with an example: let’s see why, for p = 2, k_2 = 2.

The number 2 is a very common prime factor, also in very small numbers like those ones of our example: between 1 and 10 there are exactly 5 multiples of 2, but there are also numbers that are multiples of a power of 2, containing the factor two more than once, like 8 = 2^3 or 12 = 3^2 \cdot 3. Let’s write, for all numbers between 1 and 20, how many times they contain the prime factor 2, that is the exponent of 2 in their factorization:

i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Exponent of 2 in the factorization | 0 | 1 | 0 | 2 | 0 | 1 | 0 | 3 | 0 | 1 | 0 | 2 | 0 | 1 | 1 | 4 | 0 | 1 | 0 | 2 |

Shown like this, the exponents of 2 seem to follow a chaotic law, but all appears much clearer if we distinguish 2 from its powers. Instead of asking ourselves how many times the number 2 is present in the factorization of the number i, we can ask four different questions:

- 2 \mid i?
- 4 \mid i?
- 8 \mid i?
- 16 \mid i?

To each of these questions we answer 0 in negative case; 1 in positive case. Finally, if we sum up all the four answers, we obtain precisely the exponent of 2 in the factorization of i. For example, for i = 8 we have:

- 2 \mid 8? 1
- 4 \mid 8? 1
- 8 \mid 8? 1
- 16 \mid 8? 0

Finally we obtain 1 + 1 + 1 + 0 = 3, that is precisely the exponent of 2 in the factorization of 8, in fact 8 = 2^3. Of course we could go ahead endlessly with these questions, considering increasing powers of 2, but in this case i \leq 20 and 16 is the greatest power of 2 less than or equal to 20, so it does not make sense going ahead: even if we did, the answers would be always negative, and we would have a sum of infinite zeros. Let’s rewrite Table 1 this way:

i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

2 \mid i? | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |

4 \mid i? | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |

8 \mid i? | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |

16 \mid i? | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |

As a consequence of what we said before, the single row of Table 1 is equal to the sum of the rows of Table 2, as it’s easy to verify.

Looking at Table 2 we can finally see a regularity: on each row, the zeros and the ones alternate themselves regularly: a zero followed by a one on the first row, 3 zeros followed by a one on the second row, and so on. This regularity will turn out to be very useful in a while.

Who is familiar with dashed lines can recognize in Table 2 the representation of a linear dashed line. In particular the representation of this dashed line coincides with the first four rows of the table obtained with the dashed line viewer with n = 20 selecting the option “Prime numbers less than or equal to n and their powers”.

Let’s come back to the binomial (1). We can rewrite it as follows, focusing our attention on the number 2:

The number of 2s that finally remain, after the cancellation, is the number of 2s at the numerator, minus the number of 2s at the denominator. But now we can count how many they are on the basis of Table 2. For counting the 2s at the numerator, for example, we have to sum up all the columns from number 11 to 20, obtaining:

Analogously we can count the 2s at the denominator by summing up the columns from 1 to 10:

Therefore at the end, after the cancellation, the number of 2s that will remain, which we called k_2, will be 10 - 8 = 2.

So far we just expressed in greater detail, concerning the number 2, what happens in passage (3), but now we can make a difference in quality. The same result, in fact, can be obtained by summing up Table 2 not by columns, but by rows. This means that, for computing the number of 2s at the numerator, we sum up the portions of the rows of Table 2, that are between column 11 and column 20:

Likewise, for computing the number of 2s at the denominator, we sum up the portions of the rows of Table 2, that are between column 1 and column 10:

So the difference between these two numbers, k_2, can be obtained by subtracting member by member (5) from (4):

This means, in practice, subtracting by rows the left side of Table 2 (columns 1-10) from the right side (columns 11-20). But we know the positions of 1s in Table 2: in the first row, the 1s are located in the columns which are multiples of 2, in the second row they are located in columns multiple of 4, etcetera. So the expression (6) can be rewritten as:

Both sets \{1, \dots, 10\} and \{11, \dots, 20\} are made up of 10 consecutive integers. In general, the number of multiples of a certain number m (not necessarily prime) in a set of h consecutive numbers is subject to a precise limitation:

In particular, if the interval is of the kind \{1, \dots, h\}, the left equality holds:

The proof of these formulas would deserve a separate speech, but for the moment we just apply them.

By subtracting member by member (8) from (9) we have that:

From this, since \left \lceil \frac{h}{m} \right \rceil - \left \lfloor \frac{h}{m} \right \rfloor \leq 1, we obtain:

The central quantity is perfectly overlapping with each addend of (7), by putting h = n = 10 and m = a power of p = 2. Putting this into (7) we have so:

Here we are counting a 1 for each row of Table 2, so the sum we have obtained is the number of the rows of the table. It has 4 rows, because 2^4 is the greatest power of 2 (the prime p we have chosen) less than or equal to 20 (that is 2n). From 2^4 \leq 20 we obtain, computing the logarithm on both sides, that 4 \leq \log_2 20. Summarizing, we have that

The argument carried on so far for p = 2 and 2n = 20 can be easily generalized for all p and for all n, obtaining Lemma N.1.

Having understood what exponents the prime factors of \binom{2n}{n} can be, we have to better understand what these factors can be. Like we observed initially, they are all less than or equal to 2n, so we can split them into two subsets, calling P those ones between n+1 and 2n, and Q those ones smaller. But in the post The product of the first prime numbers we saw that *all* the prime numbers between n+1 and 2n divide N, hence:

This is a fundamental equality that links prime numbers to the prime factors of a particulat number, N. According to the Bertrand’s postulate, the set P cannot be empty, that is, by (11), at least one prime factor of N must be greater than n.

What can we say instead about the set

?

Going back to the proof of Lemma N.1, we can prove that these prime factors, for n \geq 3, don’t exceed the two thirds of n:

Let’s consider a prime factor q \in Q, that is q \leq n and k_q \geq 1 (the latter inequality says that q appears indeed, with exponent at least 1, in the factorization of N). Adapting (7) and taking into account Lemma N.1, we know that

Hence if k_q \geq 1, it means that at least one of the addends must be 1 (in fact we know, from (10), that they are all 0 or 1). Let’s suppose that the first addend is 1. Note that the term (\text{number of multiples of } q \text{ between 1 and } n) is at least 1, because q \leq n, so if the whole difference (13) is at least 1, it means that the first term must be at least 2:

So there would exist (by supposing the first addend to be 1) at least 2 multiples of q between n + 1 and 2n, and another between 1 and n: in total 3 multiples of q less than or equal to 2n, so

If we supposed the other addends to be 1, we would arrive to the same conclusion. In fact, in order to obtain 1, the right side terms like (\text{number of multiples of } q^2 \text{ between } n + 1 \text{ and } 2n) must be at least 1, and in this case it would be q^2 \leq 2n, q^3 \leq 2n, etcetera. All these conditions imply that 3q \leq 2n for q \geq 3, since 3q \leq q^2 \leq q^3 \leq \dots. For q = 2 it’s not true that 3q \leq q^2, but in this case we can easily satisfy (14) by supposing n \geq 3: it’s not a problem because we are interested in understanding what happens when n is big; for small n, any property can be verified directly.

Hence (12), for n \geq 3, can be rewritten as (12′).

Now, having split the prime factors of N into the sets P and Q, we can write:

Starting from this formula the last part of the proof is developed. The main point is that, if Bertrand’s postulate was false, the set P would be empty, so only the first product would remain:

But, having found an upper threshold for the elements of Q with (12′), then also their product, \prod_{q \in Q} q^{k_q} = N, cannot exceed a certain threshold. Making the calculations, you can see that for n \geq 512 the number N is greater than the threshold of which it should be lower, by the previous argument. So for n \geq 512 the fact we supposed Bertrand’s postulate false leaded to a contradiction: so the postulate is true for n \geq 512.

In (15) we can further split the factors in Q into those ones having k_q = 1 and those ones having k_q > 1:

For better understanding this mechanism, avoiding useless symbolical complications, let’s consider another example. The example of n = 10, that we have carried on so far, isn’t enough anymore, because in this case Q = \{2\} and this is a too trivial case for giving the idea of what happens in general. So let’s consider n = 20. In this case the passages (1) – (3) become:

Let’s separate, as in (16), the primes greater than n = 20 from the others and, among the latter, those ones with exponent 1 from those ones with a greater exponent:

The first prodct, by (12′), is a product of primes less than or equal to \frac{2n}{3} = \frac{40}{3} = 13.3\dots, that is less than or equal to 13 (in this case the example fits perfectly with the prediction). Then we can majorize it with the product of *all* primes up to 13, that is \omega^{\star}(13), that by Proposition N.1, is less than or equal to 2^{2 \cdot 12} = 2^{24}:

The next product is instead the product of all prime factors with exponent at least 2; in the example there is just the factor 3. In general, if a prime q appears in the factorization with exponent at least 2, it means that at least two terms of (16) are 1, so at least one of them contains an exponent of q between 2 and \lfloor log_q 40 \rfloor. Let’s suppose, for fixing ideas, the term with exponent 2 to be 1:

Then the right member must be at least 1, that is 21 \leq q^2 \leq 40, in particular it’s important that q^2 \leq 40. This would be even more true if, for example, the term with exponent 3 was 1, so we would obtain, with the above argument, q^3 \leq 40 and so again q^2 < q^3 \leq 40. Therefore q \leq \sqrt{40} = 6.3\dots. In practice q \in {2, 3, 4, 5, 6}: we can say, without splitting hairs, as if we didn’t know which the primes are, that there are up to 5 possible values for q. Moreover, each of these q contributes to the factorization with q^{k_q}, that by Lemma N.1 is less than or equal to q^{log_q 40} = 40. Summarizing, our overestimation of the second product is the following:

On the other side, by Property N.2, we also know that \binom{40}{20} \geq \frac{2^{40}}{40}.

By putting all these estimations into (17), we obtain:

At this point, if the Bertrand’s postulate was false and so the third product missed, we would remain with:

This equality, doing the calculations, is satisfied: we haven’t found the contraddiction we were looking for. But this isn’t over yet: we have to make an idea of how our estimates behave when n increases. Generalizing (18), on the basis of the argument we carried on for n = 20, we obtain:

Well, this inequality is generally false. Proving this is just a matter of calculation, that we try to do as simply as possible. First of all, let’s eliminate the integer parts and the constant terms that do not give much contribution, by observing that:

So let’s study the inequality

We remove the denominator on the left by multiplying both members by 2n:

We remove the denominator in the central exponent by raising all to the cube:

Now we can simplify dividing by 2^{4n}:

In order to let the expression be more numerically manageable, we apply the base-2 logarithm to both members:

Although this relation is true for n = 3, 4, 5, 6, \dots (we have to start from 3 because from a certain point on we supposed n \geq 3), analytic methods let us state that the left side increases more quickly than the right side: that means that sooner or later, from a certain n on, the inequality is no more satisfied. Making the calculations you can see that this “critical” n is 512. For this choice of n we obtain 2n = 2 \cdot 512 = 1024, while 3 (1 + \sqrt{2n}) \log_2{2n} \leq 3 (1 + 32) 10 = 990.

Summarizing, we saw that, if in (16) we exclude the product \prod_{p \in P} p^{k_p}, because we suppose P to be empty, we obtain a contraddiction for n \geq 512. Therefore, for such n, P cannot be empty: this proves the Bertrand’s postulate for n \geq 512.

For the others n, of course, it’s enough to see the prime number tables in order to realize that there exists always a prime between n + 1 and 2n. This completes the proof.