Overestimation of the number of spaces up to a column included into the validity interval

Prerequisites:

In the previous article we traced the proof strategy based on spaces to the problem of overestimating the number of spaces in the first m columns of a linear dashed line. We saw that one way to solve this problem is to minimize complex modular expressions, the number of terms of which increases exponentially with the order of the dashed line (if the order of the dashed line is k, the corresponding modular expression has 2^{k-1} terms). For example, this is the expression to minimize for the fourth order:

\begin{aligned} \mathrm{err\_s}(T, m) := & n_2 n_3 n_4 (m \mathrm{\ mod\ } n_1) + n_1 n_3 n_4 (m \mathrm{\ mod\ } n_2) + n_1 n_2 n_4 (m \mathrm{\ mod\ } n_3) + n_1 n_2 n_3 (m \mathrm{\ mod\ } n_4) \\ & - n_3 n_4 (m \mathrm{\ mod\ } n_1 n_2) - n_1 n_4 (m \mathrm{\ mod\ } n_2 n_3) - n_2 n_4 (m \mathrm{\ mod\ } n_1 n_3) \\ & - n_2 n_3 (m \mathrm{\ mod\ } n_1 n_4) - n_1 n_3 (m \mathrm{\ mod\ } n_2 n_4) - n_1 n_2 (m \mathrm{\ mod\ } n_3 n_4) \\ & + n_4 (m \mathrm{\ mod\ } n_1 n_2 n_3) + n_3 (m \mathrm{\ mod\ } n_1 n_2 n_4) + n_2 (m \mathrm{\ mod\ } n_1 n_3 n_4) + n_1 (m \mathrm{\ mod\ } n_2 n_3 n_4) \\ & - m \mathrm{\ mod\ } n_1 n_2 n_3 n_4 \end{aligned} \tag{1}

We have seen that, as the order of the dashed line increases, the minimum value that this expression can assume when m varies is negative and seems to grow exponentially in absolute value.

However, to obtain this minimum value, which we called \mathrm{min\_err\_s}, we varied m considering all the possible values that it can assume in the dashed line, that is, from 1 to the length of the period (going beyond would not make sense because, due to the presence of the moduli, the values of the function are repeated at intervals equal to the length of the period). We looked for the minimum value this way because initially we wanted to study the function \mathrm{err\_s} in itself, regardless of the use we want to make of it in our proof strategy, but, as we have seen, the results were not satisfactory. It is therefore worth studying the function \mathrm{err\_s} in relation to the use we want to make of it, which can be summarized in two points:

  • The dashed lines which we are most interested finding spaces in are those of type T_k, the components of which are consecutive prime numbers from 2 onwards (then we want to analyze double dashed lines);
  • In these dashed lines, we need to look for spaces only inside the range of validity; in fact, finding a space outside would be not useful for the proof of Goldbach’s Conjecture, since we could not know whether the space found is a prime number.

For our purposes, it’s therefore sufficient to define a function similar to \mathrm{min\_err\_s}, which instead of looking for the minimum of \mathrm{err\_s} in an entire period of the dashed line, looks for it only within the validity interval. As we know, this interval extends from the last component of the dashed line, excluded, to the square of the prime number following it, also excluded. To simplify, however, we’ll stop the search for the minimum at the square of the last component of the dashed line (excluded), since it’s complicated to calculate the next prime number exactly. This way, we consider an interval included in the validity interval, so it’s sufficient to find a space in this interval to be able to say that we have also found it in the validity interval. We’ll call the new minimum search function \mathrm{min\_err\_s\_q}, where the “q” stands for “quadratic”, to remember that the search interval width grows quadratically with respect to the last component of the dashed line.

This is the definition of \mathrm{min\_err\_s\_q} for a generic dashed line T:

\mathrm{min\_err\_s\_q}(T) := \min_{n_k \lt m \lt n_k^2} \mathrm{err\_s}(T, m) \tag{2}

An interesting aspect of this function, not very evident from the definition, is that some terms of \mathrm{err\_s} can be simplified, due to the narrow interval over which the minimum is calculated. We can see this effect in the fourth order, where the function \mathrm{err\_s} is defined in general by the expression (1). If we set T := T_4 = (2, 3, 5, 7), from which (n_1, n_2, n_3, n_4) = (2, 3, 5, 7), we’ll have that:

\begin{aligned} \mathrm{err\_s}(T, m) & := \\ 105 (m \mathrm{\ mod\ } 2) + 70 (m \mathrm{\ mod\ } 3) + 42 (m \mathrm{\ mod\ } 5) + 30 (m \mathrm{\ mod\ } 7) \\ - 35 (m \mathrm{\ mod\ } 6) - 14 (m \mathrm{\ mod\ } 15) - 21 (m \mathrm{\ mod\ } 10) & \\ - 15 (m \mathrm{\ mod\ } 14) - 10 (m \mathrm{\ mod\ } 21) - 6 (m \mathrm{\ mod\ } 35) & \\ + 7 (m \mathrm{\ mod\ } 30) + 5 (m \mathrm{\ mod\ } 42) + 3 (m \mathrm{\ mod\ } 70) + 2 (m \mathrm{\ mod\ } 105) & \\ - m \mathrm{\ mod\ } 210 \end{aligned} \tag{3}

The interval in which we want to look for the minimum of the function, by (2), goes from n_4 = 7 to n_4^2 = 7^2 = 49 excluded:

\mathrm{min\_err\_s\_q}(T_4) := \min_{7 \lt m \lt 49} \mathrm{err\_s}(T_4, m)

We can observe that in (3) some moduli are larger than 49: in such cases the modulo operations can be cancelled. For example, in the expression m \mathrm{\ mod\ } 70, since 7 \lt m \lt 49, m is certainly smaller than 70, so m \mathrm{\ mod\ } 70 = m. The same thing is true for m \mathrm{\ mod\ } 105 and m \mathrm{\ mod\ } 210, so (3) becomes:

\begin{aligned} \mathrm{err\_s}(T, m) & := \\ 105 (m \mathrm{\ mod\ } 2) + 70 (m \mathrm{\ mod\ } 3) + 42 (m \mathrm{\ mod\ } 5) + 30 (m \mathrm{\ mod\ } 7) \\ - 35 (m \mathrm{\ mod\ } 6) - 14 (m \mathrm{\ mod\ } 15) - 21 (m \mathrm{\ mod\ } 10) & \\ - 15 (m \mathrm{\ mod\ } 14) - 10 (m \mathrm{\ mod\ } 21) - 6 (m \mathrm{\ mod\ } 35) & \\ + 7 (m \mathrm{\ mod\ } 30) + 5 (m \mathrm{\ mod\ } 42) + 3 m + 2 m & \\ - m \end{aligned}

from which

\begin{aligned} \mathrm{err\_s}(T, m) & := \\ 105 (m \mathrm{\ mod\ } 2) + 70 (m \mathrm{\ mod\ } 3) + 42 (m \mathrm{\ mod\ } 5) + 30 (m \mathrm{\ mod\ } 7) \\ - 35 (m \mathrm{\ mod\ } 6) - 14 (m \mathrm{\ mod\ } 15) - 21 (m \mathrm{\ mod\ } 10) & \\ - 15 (m \mathrm{\ mod\ } 14) - 10 (m \mathrm{\ mod\ } 21) - 6 (m \mathrm{\ mod\ } 35) & \\ + 7 (m \mathrm{\ mod\ } 30) + 5 (m \mathrm{\ mod\ } 42) + 4 m \end{aligned} \tag{4}

We have thus gone from an expression that contained 15 moduli to an expression that contains 12. This may not seem like a great simplification, but the effect becomes more and more marked as the order increases, because the moduli greater than n_k^2 (and therefore certainly greater than m) gradually become more and more numerous, so that more and more of them are eliminated. In fact, we have to remember that the moduli are given by all possible products of non-repeating components of the dashed line (in the fourth order: n_1, n_2, n_3, n_4, n_1 n_2, n_1 n_3, n_1 n_4, n_2 n_3, n_2 n_4, n_3, n_4, n_1 n_2 n_3, n_1, n_2 n_4, \ldots), so the more components the dashed line has, the easier it is to find a product of components that exceeds the square of the largest one. To see this, just consider the dashed line T_9 = (2, 3, 5, 7, 11, 13, 17, 19, 23): the square of the last component is 529, which is already abundantly exceeded by the product of the first five components (2 \cdot 3 \cdot 5 \cdot 7 \cdot 11 = 2310), which is the smallest product that can be formed with five components; therefore all products of at least 5 components are greater than 529, and so are many products of fewer, but larger, components (for example 7 \cdot 11 \cdot 13 = 1001).

Not only do the moduli larger than n_k^2 become more and more numerous, but they gradually become almost all the moduli. To realize this, just make some simple considerations:

  • the total number of moduli for the order k is 2^k - 1;
  • moduli smaller than n_k^2 certainly cannot be more than n_k^2 (more precisely, their cardinality is equal to that of the square-free integers smaller than n_k^2);
  • therefore, the fraction of moduli smaller than n_k^2 on the total cannot be greater than \frac{n_k^2}{2^k - 1};
  • as is known from the analysis, the ratio \frac{n_k^2}{2^k - 1} tends to zero as k increases;
  • thus, moduli smaller than n_k^2 are an infinitesimal portion of the totality (equivalently, moduli larger than n_k^2 tend to be the totality).

So, as the order increases, the expression of \mathrm{err\_s}(T, m) contains relatively few moduli (no more than the square of the last component), plus, after simplifications, a term given by m multiplied by a constant, as in (4).

The simplified expression of \mathrm{err\_s}(T, m) is worth exploring. For example:

  • The number of moduli smaller than n_k^2 is actually much smaller than n_k^2; for example in (4) we have only 12 moduli compared to n_k^2 = 49. We could investigate the order of magnitude of the number of moduli.
  • The coefficient of m, generated by the simplifications, is obtained through a sum of both positive and negative terms: we could study, for example, how the obtained number varies in function of the order of the dashed line, for dashed lines of the kind T_k.

The fact of having limited the search for the minimum, however, does not in itself guarantee that the initial problem has been solved, that is, it does not guarantee that \mathrm{err\_s}(T, m) values very large in absolute value cannot be obtained, despite the limitation on m. In fact, as we saw in the previous article, in which we had not introduced limitations on m, we know that there exist some ms that generate \mathrm{err\_s}(T, m) values vary large in absolute value; if such ms were smaller than n_k^2, the limitation we introduced would be irrelevant and we would not have solved the initial problem. To further investigate this aspect, we have done some experimental verifications.

We started by generalizing the formula from the previous article where \mathrm{err\_s} appears, this way:

s(T, m) = m \left(1 - \frac{1}{n_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{n_k} \right) + \frac{\mathrm{err\_s}(T, m)}{n_1 \cdot \ldots \cdot n_k}

where T = (n_1, \ldots, n_k) is a linear dashed line of order k.
For the moment we have limited ourselves to the study of the case T = T_k, for which we obtain:

s(T_k, m) = m \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) + \frac{\mathrm{err\_s}(T_k, m)}{p_1 \cdot \ldots \cdot p_k}

from which:

\mathrm{err\_s}(T_k, m) = \left(s(T_k, m) - m \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) \right) p_1 \cdot \ldots \cdot p_k \tag{5}

Now, we are interested in the minimum value that this formula can assume when m varies between p_k and its square excluded, given by (2). By (5), it can be expressed as:

\mathrm{min\_err\_s\_q}(T_k) = \min_{p_k \lt m \lt p_k^2} \left(s(T, m) - m \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) \right) p_1 \cdot \ldots \cdot p_k

Since \mathrm{min\_err\_s\_q}(T_k) is too large in absolute value for large orders, it is convenient to relate it to the length of the period, studying the ratio:

\frac{\mathrm{min\_err\_s\_q}(T_k)}{p_1 \cdot \ldots \cdot p_k} = \min_{p_k \lt m \lt p_k^2} \left(s(T, m) - m \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) \right)

This ratio is important to estimate how many spaces there can be in the validity interval: let’s see why.
Considering that the validity interval extends from p_k to p_k^2 not including, to calculate the number of spaces within it, we can subtract the number of spaces in the first p_k columns from the number of spaces up to and not including p_k^2:

\begin{aligned} \textrm{number of spaces in validity interval} &= \\ S(T_k, p_k^2 - 1) - S(T_k, p_k) &= \\ S(T_k, p_k^2 - 1) - 1 \end{aligned} \tag{6}

The term S(T_k, p_k) is equal to 1 because we know that in the dashed lines T_k, where the components are consecutive primes starting from 2, the only space smaller than the last component is the first column.

In general, if we consider a dashed line T = (n_1, \ldots, n_k) not of type T_k, we can observe for example that S(T, n_k) cannot be larger than n_k - k, because among the first n_k columns at least k are not spaces, i.e. those numberes wuth the components of the dashed line, therefore there cannot be more than n_k - k spaces. So, in general S(T, n_k^2 - 1) - S(T, n_k) \geq S(T, n_k^2 - 1) + k - n_k.

The term S(T_k, p_k^2 - 1) can be underestimated using (5) and the definition of \mathrm{min\_err\_s\_q}(T_k):

\begin{aligned} S(T_k, p_k^2 - 1) & = \\ (p_k^2 - 1) \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) + \frac{\mathrm{err\_s}(T, m)}{p_1 \cdot \ldots \cdot p_k} & \geq \\ (p_k^2 - 1) \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) + \frac{\mathrm{min\_err\_s\_q}(T_k, m)}{p_1 \cdot \ldots \cdot p_k} \end{aligned}

hence, by (6):

\begin{aligned} \textrm{number of spaces in the validity interval} & \geq \\ (p_k^2 - 1) \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) - 1 + \frac{\mathrm{min\_err\_s\_q}(T_k, m)}{p_1 \cdot \ldots \cdot p_k} \end{aligned} \tag{7}

We calculated the ratio \frac{\mathrm{min\_err\_s\_q}(T_k, m)}{p_1 \cdot \ldots \cdot p_k} from k = 2 to k = 100 and observed that:

  • The ratio initially decreases slowly, then it sharply increases the rate of descent after order 45 (Figure 1);
  • Although the ratio decreases faster and faster, the sum of it and the function (p_k^2 - 1) \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) - 1 remains positive, a sign that the latter function grows faster (Figures 2 and 3).

After proving the second point, by (7) we would have that the number of spaces in the validity interval is positive, which is precisely the purpose of the proof strategy. For the dashed lines of the kind T_k this property is already known, because by the Bertrand’s Postulate there is a prime number, and therefore a space, between p_k + 1 and 2 p_k, which will be inside the validity interval, being 2 p_k \lt p_k^2 for k \geq 2. However, if (7) were to be proved by studying the function \mathrm{min\_err\_s\_q}(T_k, m), the techniques employed could perhaps also be applied to double dashed lines (as well as to linear dashed lines not of the T_k type), leading to the proof of Hypothesis H.3.

Figure 1: graph of min_err_s_q as a function of the order k of the dashed line T_k
Figure 2: estimate of the number of spaces in the validity interval of the dashed lines T_k, as k varies
Figure 3: Comparison between min_err_s_q and the estimate of spaces in the validity interval, for dashed lines T_k with increasing k

During this study, we also asked ourselves what is the minimum point of the error function, that is, what is the value m_{\textrm{min}}, included in the validity interval, such that \mathrm{min\_err\_s\_q}(T_k) = \mathrm{err\_s}(T_k, m_{\textrm{min}}). We discovered that this value tends to be positioned near the extremes of the validity interval; in particular:

  • Up to order 40, m_{\textrm{min}} is close to p_k;
  • From order 41 to order 46, m_{\textrm{min}} is close to p_k for some orders, close to p_k^2 for others;
  • From order 47 onwards, m_{\textrm{min}} is very close to p_k^2.

At the moment we have no hypotheses that can explain this behavior.

Figure 4: value of m which, among all the values included between the last component of the dashed line and its square excluded, makes err_s minimum for the dashed lines T_k, as k varies

As you can see, in many points the graph of m_{\textrm{min}} is indistinguishable from that of p_k^2, but actually there are small numerical differences, which interested readers can view in detail below.

From the following data, you can see how close m_{\mathrm{min}} is, depending on the case, to p_k or to p_k^2. Another interesting thing is that, when it is close to p_k, it’s always of the form p - 1 where p is a prime number slightly larger than p_k.

order p_k m_{\mathrm{min}} p_k^2
2 3 4 9
3 5 10 25
4 7 10 49
5 11 12 121
6 13 16 169
7 17 18 289
8 19 28 361
9 23 28 529
10 29 30 841
11 31 36 961
12 37 40 1369
13 41 42 1681
14 43 46 1849
15 47 52 2209
16 53 58 2809
17 59 60 3481
18 61 66 3721
19 67 70 4489
20 71 72 5041
21 73 78 5329
22 79 82 6241
23 83 88 6889
24 89 96 7921
25 97 100 9409
26 101 102 10201
27 103 106 10609
28 107 126 11449
29 109 126 11881
30 113 126 12769
31 127 130 16129
32 131 136 17161
33 137 148 18769
34 139 148 19321
35 149 150 22201
36 151 156 22801
37 157 162 24649
38 163 166 26569
39 167 172 27889
40 173 178 29929
41 179 32026 32041
42 181 190 32761
43 191 36450 36481
44 193 196 37249
45 197 222 38809
46 199 222 39601
47 211 44482 44521
48 223 49726 49729
49 227 51406 51529
50 229 52432 52441
51 233 54268 54289
52 239 56430 57121
53 241 58026 58081
54 251 62968 63001
55 257 65518 66049
56 263 69142 69169
57 269 72210 72361
58 271 73416 73441
59 277 76729 76729
60 281 78778 78961
61 283 79800 80089
62 293 85816 85849
63 307 94249 94249
64 311 96721 96721
65 313 97840 97969
66 317 100482 100489
67 331 108862 109561
68 337 112900 113569
69 347 120382 120409
70 349 121786 121801
71 353 124600 124609
72 359 128812 128881
73 367 134668 134689
74 373 139120 139129
75 379 143460 143641
76 383 146668 146689
77 389 151152 151321
78 397 157176 157609
79 401 160578 160801
80 409 166596 167281
81 419 175561 175561
82 421 177208 177241
83 431 185518 185761
84 433 187066 187489
85 439 192528 192721
86 443 196246 196249
87 449 201490 201601
88 457 208836 208849
89 461 212410 212521
90 463 214362 214369
91 467 218068 218089
92 479 229122 229441
93 487 237136 237169
94 491 241026 241081
95 499 247990 249001
96 503 252868 253009
97 509 259081 259081
98 521 271441 271441
99 523 273516 273529
100 541 292660 292681
To recap, it remains to be proved that:

  • As k increases, the function \frac{\mathrm{min\_err\_s\_q}(T_k, m)}{p_1 \cdot \ldots \cdot p_k} decreases less quickly than the function (p_k^2 - 1) \left(1 - \frac{1}{p_1} \right) \cdot \ldots \cdot \left(1 - \frac{1}{p_k} \right) - 1, and their sum, given by (7), is positive. This proof should be done in such a way that it can be adapted to other dashed lines, including double dashed lines.
  • As k increases, the value of m_{\textrm{min}} tends to be close to p_k^2.
  • In general, both values close to p_k and those close to p_k^2 are good candidates for m_{\textrm{min}} (i.e. they generate negative errors particularly large in absolute value).

Advertisement

Leave a Reply

Your email address will not be published. Required fields are marked *