Prerequisites:
We have seen that, thanks to the Selberg’s sieve, we can calculate an estimate of the sieve function S(A, \mathbb{P}, z), where, as usual, A is a finite set of natural numbers, \mathbb{P} is the set of prime numbers and z \gt 1 is a real number. This estimate is of the type:
where:
and P(z) is the product of the prime numbers smaller than z.
In the previous article we calculated the \lambda_d parameters that make the T estimate as low as possible. Now all that remains is to replace these values of \lambda_d in the function R which represents the error term of our estimate: this will be the aim of this article. We will proceed in subsequent steps.
Step 1: increasing the parameters \lambda_d
The expression of the \lambda_d parameters that we found in the last article is quite complicated, so we’ll avoid putting it in the expression of R and we’ll adopt a simpler approach. The fundamental issue is that it’s not necessary to calculate the error exactly, but it is sufficient to overestimate it, or rather, in order to treat excess and defect errors in the same way, it is sufficient to overestimate its absolute value, i.e. establish how much |R| can be at most. Consequently, even for the \lambda_d parameters, their exact value is not so important; rather it’s important to know how much |\lambda_d| can be at most. In particular, we can prove the following property:
Let’s take the formula for \lambda_d that we saw in the previous article (step 6, formula 10):
where (cited article, step 7, formula 11):
So, combining the two formulas:
Let’s first observe that:
In fact:
- The fraction \frac{d^{\prime}}{d} is actually an integer, where both the numerator and denominator are products of distinct prime factors; if we indicate the number of distinct prime factors of the numerator and those of the denominator respectively with the symbols M and m, the number of distinct prime factors of the ratio is equal to M - m, i.e. the number of factors of the numerator that are not simplified in the fraction. At this point there are various cases:
- If M and m are both even, then by definition \mu(d^{\prime}) = 1, \mu(d) = 1 and \mu\left(\frac{d^{\prime}}{d} \right) = 1 because also M - m is even;
- If M is even and m is odd, then by definition \mu(d^{\prime}) = 1, \mu(d) = -1 and \mu\left(\frac{d^{\prime}}{d} \right) = -1 because M - m is odd;
- If M is odd and m is even, then by definition \mu(d^{\prime}) = -1, \mu(d) = 1 and \mu\left(\frac{d^{\prime}}{d} \right) = -1 because M - m is odd;
- If M and m are both odd, then by definition \mu(d^{\prime}) = -1, \mu(d) = -1 and \mu\left(\frac{d^{\prime}}{d} \right) = 1 because M - m is even;
As you can see, in all cases \mu\left(\frac{d^{\prime}}{d} \right) = \frac{\mu(d^{\prime})}{\mu (d)} (for example, in the last case, \mu\left(\frac{d^{\prime}}{d} \right) = 1 = \frac{-1} {-1} = \frac{\mu(d^{\prime})}{\mu(d)}).
- The integer \mu(d^{\prime}) can only be 1 or -1, because d^{\prime} is a product of distinct prime factors; in both cases \mu(d^{\prime})^2 = 1
- Like \mu(d^{\prime}), \mu(d) can also be 1 or -1; if it is 1 then \frac{1}{\mu(d)} = \frac{1}{1} = 1, while if it’s -1 then \frac{1}{\mu(d)} = \frac{1}{-1} = -1: in both cases the result of the fraction is equal to \mu(d)
So the formula of \lambda_d can be simplified as follows:
where in the last step we grouped the factors that do not depend on the summation variable, which is d^{\prime}. Let’s now focus on the last summation we have obtained. For greater clarity, let’s see an example.
If P(z) = 2 \cdot 3 \cdot 5 \cdot 7 and d = 2 \cdot 3, we’ll have:
To simplify this last expression we need to use an important property of the \phi function that we’ll prove in a specific article: it’s multiplicative. By such property, if a and b are two positive coprime integers, then \phi(ab) = \phi(a) \cdot \phi(b). For example, in our case \phi(2 \cdot 3 \cdot 5) = \phi(2 \cdot 3) \cdot \phi(5), \phi(2 \cdot 3 \cdot 7) = \phi(2 \cdot 3) \cdot \phi(7) and \phi(2 \cdot 3 \cdot 5 \cdot 7) = \phi(2 \cdot 3) \cdot \phi(5 \cdot 7). This allows us to put aside the first term in (b):
Reasoning in general terms, we’ll have that:
Substituting into (a), we’ll have:
Let’s now focus on the ratio \frac{d}{\phi(d)}. Also in this case, for clarity, let’s see an example.
If d = 2 \cdot 3, remembering that \phi is multiplicative, we can express the ratio as follows:
Applying our definition of \phi (Selberg’s sieve: study of the parameters \lambda, step 2):
In fact, the only two pairs of integer numbers d^{\prime}, d^{\prime \prime} having 2 as their product are (1, 2) and (2, 1). But the same property also holds if we replace 2 with any other prime number p, so in general \phi(p) = p - 1, from which p = \phi(p) + 1. In our case:
Generally:
Substituting into (e), we’ll have:
Multiplying the two summations and remembering once again that the function \phi is multiplicative, we’ll have:
In the last step we set c := b \cdot a, hence, b being a divisor of d and a a divisor of \frac{P(z)}{d}, certainly their product c is a divisor of d \cdot \frac{P(z)}{d} = P(z), so each term present in the penultimate summation is also present in the last one. However, the converse is also valid, because, given any divisor c of P(z), it is always possible to decompose c into the product of its common prime factors with d – the product of which we can call b – multiplied by the product of its prime factors in common with \frac{P(z)}{d} – the product of which we can call a (note that such decomposition is possible because d and \frac{P(z)}{d} have no divisors in common). For this reason every element of the last summation can also be found in the penultimate one, and the last equality follows.
But we already know the last summation: it’s just the definition of V(z) (Selberg’s sieve: study of the parameters \lambda, formula 12); so we can simplify formula (f) as follows:
which is precisely what we observed in the final example of the previous article.
Now, remembering that \mu(d) = \pm 1 since d has no squares, the conclusion is simple:
from which, in particular, we have the required statement:
For the proof just concluded, we could have stated the stronger result |\lambda_d| = 1 instead of settling for inequality, but we didn’t do this because the equality is no longer true if additional hypotheses are added, as we’ll do at the end of this article.
Substituting into (2), we obtain a fairly simple expression of the upper bound for |R|:
The next step is to simplify this relationship.
Step 2: Group by least common multiple
An important aspect can be observed regarding formula (2′): if d_1 and d_2 are divisors of P(z), their least common multiple must also be a divisor of P(z).
Why must the least common multiple between d_1 and d_2 divide P(z)?
The reason lies precisely in the definition of least common multiple: if d_1 and d_2 are divisors of P(z), it means that P(z) is a common multiple of d_1 and d_2; but by definition the least common multiple divides any common multiple, so it also divides P(z). For example, 6 and 4 are divisors of 24, so 24 is a common multiple for both; so the least common multiple between 4 and 6 must divide 24 (in fact, the least common multiple is 12, which divides 24).
So the possible values of \mathrm{lcm}(d_1, d_2), in (2′), are the divisors of P(z), but it’s possible that several (d_1, d_2) pairs have the same least common multiple, so there will be repeated values of r(\mathrm{lcm}(d_1, d_2)). We can therefore think of grouping equal values of r(\mathrm{lcm}(d_1, d_2)), multiplying each by the number of times it appears in the summation, i.e. by the number of pairs (d_1, d_2) having the least common multiple indicated in the subscript:
Suppose that P(z) = 2 \cdot 3 \cdot 5 = 30. Then the divisors of P(z) are 2, 3, 5, 6, 10, 15 and 30. We can develop (2′), setting for brevity r(\mathrm{m}(x, y)) := f(x, y), as follows:
As you can see, equal values of r(\mathrm{lcm}(d_1, d_2)) are repeated several times; so, by counting the number of repetitions of each value, the summation can be written more briefly as:
As you can see, the number of repetitions of each |r(d)| is always a power of 3: this is no coincidence, as we’ll see shortly.
It therefore remains to calculate, for each d, the cardinality of the set \{d_1, d_2 \mid P(z) \textrm{\ such that\ } \mathrm{lcm}(d_1, d_2) = d\}, i.e. how many pairs of divisors of P(z) have d as least common multiple.
First we can observe that the condition d_1, d_2 \mid P(z) is superfluous: in fact it can be obtained from the condition \mathrm{lcm}(d_1, d_2) = d, remembering that d in turn a divisor of P(z); therefore we can simply count how many pairs of integers have least common multiple d without any particular constraints, and the condition d_1, d_2 \mid P(z) will be automatically satisfied. In other words, the following equality holds:
The counting of the number of pairs is given by Proposition N.29, of which for convenience we report the statement (following the previous link you can also find the proof and an example):
Number of integers pairs having a fixed square-free least common multiple
Let d be a square-free positive integer. The number of ordered pairs of positive integers having least common multiple d is equal to:
Applying Proposition N.29 to our case, we’ll have that:
So, combining formulas (4), (5) and (6), we’ll obtain:
Finally, by combining this last formula with (2′), we’ll obtain the overestimate we were looking for:
As you can see, this function depends only on z, which is one of the parameters set initially, so the result will be a number.
In the first example, with P(z) = 2 \cdot 3 \cdot 5 = 30, we saw that:
From (8) we know that the numbers that multiply r(d) can be expressed as 3^{\omega(d)}, for example the 9 which multiplies |r(15)| is given by 3^{\omega(15)}, as established in the previous example.
Now let’s carry out the calculations to obtain a concrete overestimate of R. In this regard, remember that the function r is defined (Theorem C.4) by the relationship r(d) := |A_d| - \frac{|A|}{d}, where |A_d| is the number of multiples of d in the set A. If we let A be the set of all the integers from 1 to a certain fixed integer N, as we have seen in previous articles, the formula becomes r(d) = \left\lfloor \frac{N}{d} \right\rfloor - \frac{N}{d}. In particular, if we set N := 50, we’ll have:
- |r(1)| = \left| \left\lfloor \frac{N}{1} \right\rfloor - \frac{N}{1} \right| = \left| \left\lfloor \frac{50}{1} \right\rfloor - \frac{50}{1} \right| = |50 - 50| = 0
- |r(2)| = \left| \left\lfloor \frac{N}{2} \right\rfloor - \frac{N}{2} \right| = \left| \left\lfloor \frac{50}{2} \right\rfloor - \frac{50}{2} \right| = |25 - 25| = 0
- |r(3)| = \left| \left\lfloor \frac{N}{3} \right\rfloor - \frac{N}{3} \right| = \left| \left\lfloor \frac{50}{3} \right\rfloor - \frac{50}{3} \right| = \left|16 - \frac{50}{3}\right| = \left|- \frac{2}{3}\right| = \frac{2}{3}
- |r(5)| = \left| \left\lfloor \frac{N}{5} \right\rfloor - \frac{N}{5} \right| = \left| \left\lfloor \frac{50}{5} \right\rfloor - \frac{50}{5} \right| = |10 - 10| = 0
- |r(6)| = \left| \left\lfloor \frac{N}{6} \right\rfloor - \frac{N}{6} \right| = \left| \left\lfloor \frac{50}{6} \right\rfloor - \frac{50}{6} \right| = \left|8 - \frac{50}{6}\right| = \left|- \frac{2}{3}\right| = \frac{2}{3}
- |r(10)| = \left| \left\lfloor \frac{N}{10} \right\rfloor - \frac{N}{10} \right| = \left| \left\lfloor \frac{50}{10} \right\rfloor - \frac{50}{10} \right| = |5 - 5| = 0
- |r(15)| = \left| \left\lfloor \frac{N}{15} \right\rfloor - \frac{N}{15} \right| = \left| \left\lfloor \frac{50}{15} \right\rfloor - \frac{50}{15} \right| = \left|3 - \frac{50}{15}\right| = \left|- \frac{5}{15}\right| = \frac{1}{3}
- |r(30)| = \left| \left\lfloor \frac{N}{30} \right\rfloor - \frac{N}{30} \right| = \left| \left\lfloor \frac{50}{30} \right\rfloor - \frac{50}{30} \right| = \left|1 - \frac{50}{30}\right| = \left|- \frac{20}{30}\right| = \frac{2}{3}
So, substituting, we’ll get:
Even just limiting ourselves to this example, we can see that this error estimate is quite high: we are estimating the count of numbers not divisible by 2, 3 and 5 less than 50, and to be wrong by 29 is really a lot, considering that the numbers up to 50 that are not divisible by 2, 3 and 5 are much less than 29 (namely, they are: 1, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, e 47). In the next paragraph we’ll see how to improve this aspect.
Step 3: Limit d to make the error small
As we saw in the previous example, formula (8) tends to generate very large error estimates. This is not surprising, as the summation is extended to all divisors of P(z), which grow exponentially with respect to P(z). Furthermore, the number of prime factors of each divisor (\omega(d)) appears as an exponent of 3, potentially generating very large values; so not only does the summation have many terms, but each term is also potentially very large.
Making the terms of the summation smaller is difficult, because it would mean thinking in a completely different way than what has been done so far, but it’s possible to reduce the number of terms by changing only some details. To do this, however, we need to take a step back. It’s possible to prove that the overestimation
continues to be valid if T and R are expressed by the following formulas, to be used instead of (1) and (2):
Compared to the previous formulas, we have imposed an additional condition on d_1 and d_2, establishing that they must be less than z. But how is it possible to make a change of this magnitude without distorting the reasoning?
The reason is that we have full freedom in choosing the \lambda_d parameters. In fact, remember that (Selberg’s sieve: statement and beginning of the proof) they can be any real numbers, except \lambda_1 which must be equal to 1. So we can choose them arbitrarily, and nothing prevents us from putting:
This way, equations (1) and (2) would be equivalent to (1′) and (2′) respectively; in fact in all terms with d_1 \gt z or with d_2 \gt z we would have that \lambda_{d_1} \lambda_{d_2} = 0, therefore such terms would be equal to zero.
This modification can be introduced from the beginning and, apart from a single exception, does not modify the logical and algebraic steps of the previous articles: it only adds one more condition in each formula (depending on the case, d \lt z, d_1 \lt z, d_2 \lt z or d^{\prime} \lt z). For example, formulas (10) and (12) of the article Selberg’s sieve: study of the parameters λ would respectively become:
In this article, however, the meaning of the symbol d is a little different: it is the least common multiple of d_1 and d_2, which in turn are both less than z. In this regard, being d_1, d_2 \lt z \Rightarrow d_1 d_2 \lt z^2, and \mathrm{lcm}(d_1, d_2) \leq d_1 d_2, the following property holds:
which however is not valid in the other sense, as it can be easily checked (if for example z = 10, \mathrm{lcm}(50, 50) = 50 \lt 100 but 50 is not less than 10). So, being d = \mathrm{lcm}(d_1, d_2), we can say that:
In fact, the summation on the right will have at least the same terms as the one on the left, because the lowest common multiples smaller than z^2 are at least those obtainable starting from two numbers smaller than z. The \leq on the right is therefore less stringent than the one on the left (and consequently the increase is less precise); however, the formula on the right is still preferred because the other one is less explicit, given that d_1 and d_2 don’t appear as variables in the summation; only d appears, which is derived from them.
What is the only exception where (9) modifies the logical steps of the proof?
The only point of the proof that is substantially modified by hypothesis (9) is the proof that |\lambda_d| \leq 1. As we saw above, without that hypothesis we can prove that \lambda_d = \mu(d), from which |\lambda_d| = 1, while this is no longer true by adding the additional assumption. What changes is this single step of the proof of (3):
which, after considering the new hypothesis in the previous formulas (quite mechanical steps that we’ll neglect), becomes:
where we have an inequality in place of the previous equality.
An example can help us clarify the reason for the inequality.
Suppose that z = 20, from which P(z) = 2 \cdot 3 \cdot 5 \cdot 7 \cdot 11 \cdot 13 \cdot 17 \cdot 19; let’s also assume that d = 6. Then in the left summation of (g’) we’ll have that:
- b \mid d \Rightarrow b \mid 6 \Rightarrow b \in \{1, 2, 3, 6\}
- a \mid \frac{P(z)}{d} \Rightarrow a \mid 5 \cdot 7 \cdot 11 \cdot 13 \cdot 17 \cdot 19, from which there are 2^6 possibilities for a, as many as the possible divisors of a product of six distinct integers
- da \lt z \Rightarrow 6a \lt 20 \Rightarrow a \in \{1, 2, 3\}
- From the last two points it follows that a = 1, because 1 is the only number that satisfies both conditions
Ultimately we therefore have 4 possibilities for b and one possibility for a, so there will be 4 \cdot 1 = 4 possible terms of the type \frac{1}{\phi(b \cdot a)}: \frac{1}{\phi(1)}, \frac{1}{\phi(2)}, \frac{1}{\phi(3)} and \frac{1}{\phi(6)}.
In the right summation we’ll have that:
- c \mid P(z) \Rightarrow c \mid 2 \cdot 3 \cdot 5 \cdot 7 \cdot 11 \cdot 13 \cdot 17 \cdot 19, from which there are 2^8 possibilities for b, as many as the possible divisors of a product of eight distinct integers
- c \lt z \Rightarrow c \lt 20
The values of c that satisfy both conditions are the following: 1, 2, 3, 5, 6, 7, 10, 11, 13, 14, 15, 17, 19: this time there are 13 possibilities, i.e. 13 possible terms of the type \frac{1}{\phi(c)}, where c is one of the values just listed. It’s therefore evident that the right summation contains all the terms of the left summation plus others, therefore the relation of \leq holds, which in the end is inherited from the relation |\lambda_d| \leq 1.
If, however, we did not consider the additional hypothesis, i.e. in (g), both summations would have the same terms, of the type \frac{1}{\phi(c)}, where c is one of the 2^8 divisors of P(z); in the left summation in particular we would have the same 4 possibilities for b and 2^6 possibilities for a, for a total of 2^8 possibilities.
Our final estimate of the error will therefore be the following:
This restriction on d greatly lowers the overestimate of R; in fact, as z increases, the divisors of P(z) less than z^2 become an increasingly smaller fraction of the total, hence the condition tends to exclude most of the possible terms, making the increase much closer to the real value. To understand the extent of this effect, it’s useful to compare the graphs of the overestimates (8) and (8′). We’ll do this by taking the example of the article Selberg’s sieve: statement and beginning of the proof, where we have used the parameters of the sieve of Eratosthenes, i.e. N \gt 1, A = \{1, 2, \ldots, N\} and z = \sqrt {N + 1}:
As you can see, the two increases have very different orders of magnitude. The graphs are also qualitatively different, because the one of the error function without the limitation on d shows some jumps, which occur when, from one value of N to another, a new prime number is added in the product P(z): this doubles the number of ds considered in the summation (because the number of divisors of a product of k+1 prime numbers is double that of a product of k prime numbers), so the final value is suddently much larger than the previous one.
You already know the algorithmic sieve using all odd numbers is correct, and must always output a value less than the actual amount of Goldbach pairs for any given number, and must always be greater than 1 for any number over 16. Why try so hard to find a more complicated method, just to call it your own? It is unlikely to provide a valid proof.
We are trying hard to find a more complicated method because we have not found (yet?) a simpler way for proving Goldbach’s Conjecture. An algorithmic sieve can only return a partial solution, that is:
The major problem is in the “could” in the last sentence. We only know that the list contains a Goldbach pair for some even numbers, but proving Goldbach’s Conjecture means proving that a Goldbach pair exists for all even numbers greater than 4.