Prerequisites:
- Proof strategy based on spaces
- Proof strategy based on spaces: method based on the approximate calculation of spaces
In the previous article we traced the proof strategy based on spaces to the problem of overestimating the number of spaces in the first m columns of a linear dashed line. We saw that one way to solve this problem is to minimize complex modular expressions, the number of terms of which increases exponentially with the order of the dashed line (if the order of the dashed line is k, the corresponding modular expression has 2^{k-1} terms). For example, this is the expression to minimize for the fourth order:
We have seen that, as the order of the dashed line increases, the minimum value that this expression can assume when m varies is negative and seems to grow exponentially in absolute value.
However, to obtain this minimum value, which we called \mathrm{min\_err\_s}, we varied m considering all the possible values that it can assume in the dashed line, that is, from 1 to the length of the period (going beyond would not make sense because, due to the presence of the moduli, the values of the function are repeated at intervals equal to the length of the period). We looked for the minimum value this way because initially we wanted to study the function \mathrm{err\_s} in itself, regardless of the use we want to make of it in our proof strategy, but, as we have seen, the results were not satisfactory. It is therefore worth studying the function \mathrm{err\_s} in relation to the use we want to make of it, which can be summarized in two points:
- The dashed lines which we are most interested finding spaces in are those of type T_k, the components of which are consecutive prime numbers from 2 onwards (then we want to analyze double dashed lines);
- In these dashed lines, we need to look for spaces only inside the range of validity; in fact, finding a space outside would be not useful for the proof of Goldbach’s Conjecture, since we could not know whether the space found is a prime number.
For our purposes, it’s therefore sufficient to define a function similar to \mathrm{min\_err\_s}, which instead of looking for the minimum of \mathrm{err\_s} in an entire period of the dashed line, looks for it only within the validity interval. As we know, this interval extends from the last component of the dashed line, excluded, to the square of the prime number following it, also excluded. To simplify, however, we’ll stop the search for the minimum at the square of the last component of the dashed line (excluded), since it’s complicated to calculate the next prime number exactly. This way, we consider an interval included in the validity interval, so it’s sufficient to find a space in this interval to be able to say that we have also found it in the validity interval. We’ll call the new minimum search function \mathrm{min\_err\_s\_q}, where the “q” stands for “quadratic”, to remember that the search interval width grows quadratically with respect to the last component of the dashed line.
This is the definition of \mathrm{min\_err\_s\_q} for a generic dashed line T:
An interesting aspect of this function, not very evident from the definition, is that some terms of \mathrm{err\_s} can be simplified, due to the narrow interval over which the minimum is calculated. We can see this effect in the fourth order, where the function \mathrm{err\_s} is defined in general by the expression (1). If we set T := T_4 = (2, 3, 5, 7), from which (n_1, n_2, n_3, n_4) = (2, 3, 5, 7), we’ll have that:
The interval in which we want to look for the minimum of the function, by (2), goes from n_4 = 7 to n_4^2 = 7^2 = 49 excluded:
We can observe that in (3) some moduli are larger than 49: in such cases the modulo operations can be cancelled. For example, in the expression m \mathrm{\ mod\ } 70, since 7 \lt m \lt 49, m is certainly smaller than 70, so m \mathrm{\ mod\ } 70 = m. The same thing is true for m \mathrm{\ mod\ } 105 and m \mathrm{\ mod\ } 210, so (3) becomes:
from which
We have thus gone from an expression that contained 15 moduli to an expression that contains 12. This may not seem like a great simplification, but the effect becomes more and more marked as the order increases, because the moduli greater than n_k^2 (and therefore certainly greater than m) gradually become more and more numerous, so that more and more of them are eliminated. In fact, we have to remember that the moduli are given by all possible products of non-repeating components of the dashed line (in the fourth order: n_1, n_2, n_3, n_4, n_1 n_2, n_1 n_3, n_1 n_4, n_2 n_3, n_2 n_4, n_3, n_4, n_1 n_2 n_3, n_1, n_2 n_4, \ldots), so the more components the dashed line has, the easier it is to find a product of components that exceeds the square of the largest one. To see this, just consider the dashed line T_9 = (2, 3, 5, 7, 11, 13, 17, 19, 23): the square of the last component is 529, which is already abundantly exceeded by the product of the first five components (2 \cdot 3 \cdot 5 \cdot 7 \cdot 11 = 2310), which is the smallest product that can be formed with five components; therefore all products of at least 5 components are greater than 529, and so are many products of fewer, but larger, components (for example 7 \cdot 11 \cdot 13 = 1001).
Not only do the moduli larger than n_k^2 become more and more numerous, but they gradually become almost all the moduli. To realize this, just make some simple considerations:
- the total number of moduli for the order k is 2^k - 1;
- moduli smaller than n_k^2 certainly cannot be more than n_k^2 (more precisely, their cardinality is equal to that of the square-free integers smaller than n_k^2);
- therefore, the fraction of moduli smaller than n_k^2 on the total cannot be greater than \frac{n_k^2}{2^k - 1};
- as is known from the analysis, the ratio \frac{n_k^2}{2^k - 1} tends to zero as k increases;
- thus, moduli smaller than n_k^2 are an infinitesimal portion of the totality (equivalently, moduli larger than n_k^2 tend to be the totality).
So, as the order increases, the expression of \mathrm{err\_s}(T, m) contains relatively few moduli (no more than the square of the last component), plus, after simplifications, a term given by m multiplied by a constant, as in (4).
- The number of moduli smaller than n_k^2 is actually much smaller than n_k^2; for example in (4) we have only 12 moduli compared to n_k^2 = 49. We could investigate the order of magnitude of the number of moduli.
- The coefficient of m, generated by the simplifications, is obtained through a sum of both positive and negative terms: we could study, for example, how the obtained number varies in function of the order of the dashed line, for dashed lines of the kind T_k.
The fact of having limited the search for the minimum, however, does not in itself guarantee that the initial problem has been solved, that is, it does not guarantee that \mathrm{err\_s}(T, m) values very large in absolute value cannot be obtained, despite the limitation on m. In fact, as we saw in the previous article, in which we had not introduced limitations on m, we know that there exist some ms that generate \mathrm{err\_s}(T, m) values vary large in absolute value; if such ms were smaller than n_k^2, the limitation we introduced would be irrelevant and we would not have solved the initial problem. To further investigate this aspect, we have done some experimental verifications.
We started by generalizing the formula from the previous article where \mathrm{err\_s} appears, this way:
where T = (n_1, \ldots, n_k) is a linear dashed line of order k.
For the moment we have limited ourselves to the study of the case T = T_k, for which we obtain:
from which:
Now, we are interested in the minimum value that this formula can assume when m varies between p_k and its square excluded, given by (2). By (5), it can be expressed as:
Since \mathrm{min\_err\_s\_q}(T_k) is too large in absolute value for large orders, it is convenient to relate it to the length of the period, studying the ratio:
This ratio is important to estimate how many spaces there can be in the validity interval: let’s see why.
Considering that the validity interval extends from p_k to p_k^2 not including, to calculate the number of spaces within it, we can subtract the number of spaces in the first p_k columns from the number of spaces up to and not including p_k^2:
The term S(T_k, p_k) is equal to 1 because we know that in the dashed lines T_k, where the components are consecutive primes starting from 2, the only space smaller than the last component is the first column.
The term S(T_k, p_k^2 - 1) can be underestimated using (5) and the definition of \mathrm{min\_err\_s\_q}(T_k):
hence, by (6):
We calculated the ratio \frac{\mathrm{min\_err\_s\_q}(T_k, m)}{p_1 \cdot \ldots \cdot p_k} from k = 2 to k = 100 and observed that:
- The ratio initially decreases slowly, then it sharply increases the rate of descent after order 45 (Figure 1);
- Although the ratio decreases faster and faster, the sum of it and the function (p_k^2 - 1) \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) - 1 remains positive, a sign that the latter function grows faster (Figures 2 and 3).
After proving the second point, by (7) we would have that the number of spaces in the validity interval is positive, which is precisely the purpose of the proof strategy. For the dashed lines of the kind T_k this property is already known, because by the Bertrand’s Postulate there is a prime number, and therefore a space, between p_k + 1 and 2 p_k, which will be inside the validity interval, being 2 p_k \lt p_k^2 for k \geq 2. However, if (7) were to be proved by studying the function \mathrm{min\_err\_s\_q}(T_k, m), the techniques employed could perhaps also be applied to double dashed lines (as well as to linear dashed lines not of the T_k type), leading to the proof of Hypothesis H.3.



During this study, we also asked ourselves what is the minimum point of the error function, that is, what is the value m_{\textrm{min}}, included in the validity interval, such that \mathrm{min\_err\_s\_q}(T_k) = \mathrm{err\_s}(T_k, m_{\textrm{min}}). We discovered that this value tends to be positioned near the extremes of the validity interval; in particular:
- Up to order 40, m_{\textrm{min}} is close to p_k;
- From order 41 to order 46, m_{\textrm{min}} is close to p_k for some orders, close to p_k^2 for others;
- From order 47 onwards, m_{\textrm{min}} is very close to p_k^2.
At the moment we have no hypotheses that can explain this behavior.

As you can see, in many points the graph of m_{\textrm{min}} is indistinguishable from that of p_k^2, but actually there are small numerical differences, which interested readers can view in detail below.
From the following data, you can see how close m_{\mathrm{min}} is, depending on the case, to p_k or to p_k^2. Another interesting thing is that, when it is close to p_k, it’s always of the form p - 1 where p is a prime number slightly larger than p_k.
order | p_k | m_{\mathrm{min}} | p_k^2 |
---|---|---|---|
2 | 3 | 4 | 9 |
3 | 5 | 10 | 25 |
4 | 7 | 10 | 49 |
5 | 11 | 12 | 121 |
6 | 13 | 16 | 169 |
7 | 17 | 18 | 289 |
8 | 19 | 28 | 361 |
9 | 23 | 28 | 529 |
10 | 29 | 30 | 841 |
11 | 31 | 36 | 961 |
12 | 37 | 40 | 1369 |
13 | 41 | 42 | 1681 |
14 | 43 | 46 | 1849 |
15 | 47 | 52 | 2209 |
16 | 53 | 58 | 2809 |
17 | 59 | 60 | 3481 |
18 | 61 | 66 | 3721 |
19 | 67 | 70 | 4489 |
20 | 71 | 72 | 5041 |
21 | 73 | 78 | 5329 |
22 | 79 | 82 | 6241 |
23 | 83 | 88 | 6889 |
24 | 89 | 96 | 7921 |
25 | 97 | 100 | 9409 |
26 | 101 | 102 | 10201 |
27 | 103 | 106 | 10609 |
28 | 107 | 126 | 11449 |
29 | 109 | 126 | 11881 |
30 | 113 | 126 | 12769 |
31 | 127 | 130 | 16129 |
32 | 131 | 136 | 17161 |
33 | 137 | 148 | 18769 |
34 | 139 | 148 | 19321 |
35 | 149 | 150 | 22201 |
36 | 151 | 156 | 22801 |
37 | 157 | 162 | 24649 |
38 | 163 | 166 | 26569 |
39 | 167 | 172 | 27889 |
40 | 173 | 178 | 29929 |
41 | 179 | 32026 | 32041 |
42 | 181 | 190 | 32761 |
43 | 191 | 36450 | 36481 |
44 | 193 | 196 | 37249 |
45 | 197 | 222 | 38809 |
46 | 199 | 222 | 39601 |
47 | 211 | 44482 | 44521 |
48 | 223 | 49726 | 49729 |
49 | 227 | 51406 | 51529 |
50 | 229 | 52432 | 52441 |
51 | 233 | 54268 | 54289 |
52 | 239 | 56430 | 57121 |
53 | 241 | 58026 | 58081 |
54 | 251 | 62968 | 63001 |
55 | 257 | 65518 | 66049 |
56 | 263 | 69142 | 69169 |
57 | 269 | 72210 | 72361 |
58 | 271 | 73416 | 73441 |
59 | 277 | 76729 | 76729 |
60 | 281 | 78778 | 78961 |
61 | 283 | 79800 | 80089 |
62 | 293 | 85816 | 85849 |
63 | 307 | 94249 | 94249 |
64 | 311 | 96721 | 96721 |
65 | 313 | 97840 | 97969 |
66 | 317 | 100482 | 100489 |
67 | 331 | 108862 | 109561 |
68 | 337 | 112900 | 113569 |
69 | 347 | 120382 | 120409 |
70 | 349 | 121786 | 121801 |
71 | 353 | 124600 | 124609 |
72 | 359 | 128812 | 128881 |
73 | 367 | 134668 | 134689 |
74 | 373 | 139120 | 139129 |
75 | 379 | 143460 | 143641 |
76 | 383 | 146668 | 146689 |
77 | 389 | 151152 | 151321 |
78 | 397 | 157176 | 157609 |
79 | 401 | 160578 | 160801 |
80 | 409 | 166596 | 167281 |
81 | 419 | 175561 | 175561 |
82 | 421 | 177208 | 177241 |
83 | 431 | 185518 | 185761 |
84 | 433 | 187066 | 187489 |
85 | 439 | 192528 | 192721 |
86 | 443 | 196246 | 196249 |
87 | 449 | 201490 | 201601 |
88 | 457 | 208836 | 208849 |
89 | 461 | 212410 | 212521 |
90 | 463 | 214362 | 214369 |
91 | 467 | 218068 | 218089 |
92 | 479 | 229122 | 229441 |
93 | 487 | 237136 | 237169 |
94 | 491 | 241026 | 241081 |
95 | 499 | 247990 | 249001 |
96 | 503 | 252868 | 253009 |
97 | 509 | 259081 | 259081 |
98 | 521 | 271441 | 271441 |
99 | 523 | 273516 | 273529 |
100 | 541 | 292660 | 292681 |
- As k increases, the function \frac{\mathrm{min\_err\_s\_q}(T_k, m)}{p_1 \cdot \ldots \cdot p_k} decreases less quickly than the function (p_k^2 - 1) \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) - 1, and their sum, given by (7), is positive. This proof should be done in such a way that it can be adapted to other dashed lines, including double dashed lines.
- As k increases, the value of m_{\textrm{min}} tends to be close to p_k^2.
- In general, both values close to p_k and those close to p_k^2 are good candidates for m_{\textrm{min}} (i.e. they generate negative errors particularly large in absolute value).