Prerequisites:

- Selberg’s sieve: statement and beginning of the proof
- The Möbius function and its connection with the function Λ
- The Möbius inversion formula
- Two properties of the divisors of natural numbers

As we have seen, some parameters \lambda_d appear in the Selberg’s sieve formula, which are used both in the estimation of the sieve function and in the related error term. In this article we’ll calculate these parameters in order to obtain the lowest possible estimate.

In the previous article we saw that, if A is a finite set of natural numbers, z \gt 1 is a real number and P(z) is the product of the prime numbers smaller than z, then:

where:

Recall that the \lambda_ds (where d represents a divisor of P(z)) are real numbers that can be chosen as desired, with the only constraint that \lambda_1 = 1. In this article we’ll focus on formula (1) and we’ll calculate what values the \lambda_d should have to make the T summation as low as possible. The calculation can be structured in the steps that we’ll analyze below.

## Step 1: from least common multiple to greatest common divisor

The least common multiple is a bit annoying in number theory, because almost all formulas operate in terms of divisors rather than multiples: it is a purely conventional choice which, however, should be adapted to. We’ll transform the least common multiple into the greatest common divisor using the well-known property \mathrm{lcm}(d_1, d_2) = \frac{d_1 d_2}{\mathrm{GCD}(d_1, d_2)}, applying which from formula (1) we’ll obtain:

## Step 2: introduction of the divisors of the greatest common divisor, using the Möbius inversion formula

Now we’ll introduce a further level of divisors using the Möbius inversion formula. Applying Property N.18 (including the Remark that follows it) with n := \mathrm{GCD}(d_1, d_2) and f := \mathrm{id} (identity function f(x) = x), we’ll get:

Now let’s define:

From which:

Substituting into (3), we’ll have:

Suppose that P(z) = 2 \cdot 3 \cdot 5 = 30 and we want to calculate the possible values of \phi(d). In this regard it must be noted that d is a divisor of \mathrm{GCD}(d_1, d_2), which in turn is a divisor of d_1 and d_2 which in turn are divisors of P(z); therefore, by transitivity, d is also a divisor of P(z). To make sure to compute all possible values, we’ll then compute \phi(d) for all divisors of 30:

We have called \phi this function because, for the examples that interest us, it coincides with Euler’s function, which is usually denoted by the same symbol. The two functions coincide on all square-free numbers, i.e. the ones which are the product of distinct prime factors. This coincidence is quite surprising, given that the two functions are defined in two very different ways.

## Step 3: Collect repeating \phi(d) values

In (4), we can observe that multiple pairs of integers d_1 and d_2, divisors of P(z), can have the same greatest common divisor; in this case the divisors d of the greatest common divisor will be the same, and therefore the values of \phi(d) of the internal summation will be repeated for each pair. For example, if P(z) = 2 \cdot 3 \cdot 5, the value \phi(2) will occur for both the pair (d_1, d_2) = (2, 2) and the pair (d_1, d_2) = (10, 6), because both pairs of numbers have a least common multiple equal to 2. More generally, \phi(2) will be repeated for all pairs (d_1, d_2) which have a multiple of 2 as their greatest common divisor, as can be seen from the internal summation by substituting d = 2.

Collecting these repeated \phi(d) values, we’ll obtain:

Suppose that P(z) = 2 \cdot 3. Then formula (4) can be expanded as follows:

We can observe that \phi(1) is multiplied by all the terms \frac{\lambda_{d_1} \lambda_{d_2}}{d_1 d_2}, because 1 \mid \mathrm{GCD}(d_1, d_2) whatever d_1 and d_2 are, while, on the contrary, \phi(6) appears only once. Collecting \phi(1), \phi(2), \phi(3) and \phi(6), we’ll obtain:

## Step 4: Locate the square

The innermost summation of (5) can be rewritten as the square of a simpler summation:

The reason is that the condition d \mid \mathrm{GCD}(d_1, d_2) is equivalent to d \mid d_1 \land d \mid d_2, where we used the logical \land symbol to indicate that both conditions must be satisfied. In fact, by definition of greatest common divisor, all the divisors of the greatest common divisor between two numbers divide the numbers themselves and, vice versa, two numbers that divide them both are common divisors, therefore they are divisors of the greatest common divisor. So the left part of (6) can be rewritten as:

which in turn can be written differently as follows:

From this last summation we can obtain the right part of (6) using the same considerations as the “first step” (formula 5) of the previous article.

Let’s take the previous example again, where P(z) = 2 \cdot 3 and therefore d \mid P(z) \Leftrightarrow d \mid 6 \Leftrightarrow d \in \{1, 2, 3, 6\}. Let’s analyze the individual cases.

For d = 6 we’ll have that:

From the condition 6 \mid \mathrm{GCD}(d_1, d_2) it follows that d_1 and d_2 are multiples of 6, but by the condition d_1, d_2 \mid 6 they divide 6, so it must be d_1 = d_2 = 6. So:

For d = 3 we’ll have that:

From the condition 3 \mid \mathrm{GCD}(d_1, d_2) it follows that d_1 and d_2 are multiples of 3, but by the condition d_1, d_2 \mid 6 they divide 6, so each of them can be either 3 or 6. So:

The case of d = 2 is analogous to that of d = 3, so let’s see directly what happens when d = 1:

The condition 1 \mid \mathrm{GCD}(d_1, d_2) is always true, so this condition does not impose any further restrictions on d_1 and d_2, which according to the condition d_1, d_2 \mid 6 can be 1, 2, 3 or 6. So:

It can be observed that the summations (a), (b) and (c) are those that multiply respectively \phi(6), \phi(3) and \phi(1) in the previous example.

Substituting (6) into (5), we’ll obtain:

## Step 5: introduce the new variable w_d

Now we can simplify formula (7) by putting:

hence:

We need such an expression of T in order to minimize it; in fact we can proceed in two phases:

- Find the values of w_d that minimize T in formula (9)
- Calculate the values of \lambda_d starting from the values of w_d, using (8)

These will be the next steps.

Let’s look at formula (8) more explicitly, assuming that P(z) = 2 \cdot 3 \cdot 5. The variable d, as it can be seen from formula (7), is a divisor of P(z), therefore the possible w_d are w_1, w_2, w_3, w_5, w_6, w_{10}, w_{15}, w_{30}. Let’s look at them one at a time:

## Step 6: calculate \lambda_d starting from w_d

For reasons that will become clear shortly, let’s analyze the two remaining steps in reverse order, starting with the second. So let’s ask ourselves: how can we calculate the values of \lambda_d, once we have calculated those of w_d?

Let’s see an example.

As we saw in the previous example, formula (8) corresponds to the following system of equations:

Assuming we have already calculated the w_ds, we therefore have a system of 8 equations in the 8 unknowns \lambda_1, \lambda_2, \lambda_3, \lambda_5, \lambda_6, \lambda_{10}, \lambda_{15}, \lambda_{30}. We could solve it with any general solution method, but in this particular case there is a simpler method.

First of all, from the last equation it is possible to directly obtain \lambda_{30} = 30 \cdot w_{30}.

To calculate \lambda_{15} we can instead start from w_{15} and proceed this way:

Similarly, \lambda_{10} and \lambda_6 can be calculated. The calculation of \lambda_5 is a little more complex:

Similarly, \lambda_2 and \lambda_3 can be calculated.

The calculation of \lambda_1 is even more complex; in fact, as it can be verified, the final formula is the following:

In general, the calculation of \lambda_d can be performed according to the following procedure:

- Start from w_d
- Add/subtract all w_{d^{\prime}} where d^{\prime} is a multiple of d
- Multiply everything by d

The signs of the w_{d^{\prime}} remain to be determined. To be able to visualize them we use the Hasse diagram, by which all the divisors of a given number can be represented, in this case P(z) = 30:

The indices d^{\prime} of the w_{d^{\prime}}s to add/subtract to obtain \lambda_d correspond to all numbers reachable from d in the Hasse diagram, going upwards. For example, we saw that the formula for d = 5 is 5 \left(w_5 - w_{10} - w_{15} + w_{30} \right), and the numbers reachable from 5 going upwards are the same indices that appear in the formula:

Once the w_{d^{\prime}}s have been identified, the signs are given by the following rule:

- The sign of w_{d^{\prime}} is positive if d^{\prime} is reachable from d in an even number of steps
- The sign of w_{d^{\prime}} is negative if d^{\prime} is reachable from d in an odd number of steps

For example, for d = 5 the signs are as follows:

Indeed:

- 5 is reachable from 5 in 0 steps; 0 is an even number and therefore w_5 has a positive sign.
- 10 and 15 are reachable from 5 in 1 step; 1 is an odd number and therefore w_{10} and w_{15} have a negative sign.
- 30 is reachable from 5 in 2 steps; 2 is an even number and therefore w_{30} has a positive sign. We observe that 30 can be reached from 5 via two different paths but both composed of two steps, so the sign does not depend on the path considered; this is a general property of Hasse diagrams that allows the calculation of the sign to be unambiguous.

In general the following formula applies:

In fact \frac{d^{\prime}}{d} has as many distinct prime factors as the number of steps that separate d and d^{\prime} in the Hasse diagram (since each step corresponds to multiplication by a different prime factor); furthermore the function \mu returns 1 if this number is even, -1 if it’s odd.

Starting from the example and using the generic P(z) instead of 30, we’ll obtain the formula:

The proof is an application of a theorem called the dual Möbius inversion formula. We’ll leave the details to interested readers.

What does the dual Möbius inversion formula consist of and how is it applied to prove (10)?

For convenience, we report the statement of the dual Möbius inversion formula (by following the previous link you can also find the proof and an example):

Dual Möbius inversion formula

Let n be a positive natural number and let f be a function defined on the divisors of n.

Let g be the following function defined on the divisors of n:

Then, for every k \mid n:

In our case we must apply the formula by setting:

- n := P(z)
- f(d^{\prime}) = \frac{\lambda_{d^{\prime}}}{d^{\prime}}

Hence g(d) = \sum_{d \mid d^{\prime} \mid n} f(d^{\prime}) = \sum_{d \mid d^{\prime} \mid P(z)} \frac{\lambda_{d^{\prime}}}{d^{\prime}} = w_d by (8), so we can replace w_d instead of g(d).

Let’s now start from the formula of the theorem:

Making the substitutions, we’ll obtain:

Multiplying by k and changing the names of the variables, we’ll obtain formula (10).

## Step 7: Find the values of w_d that minimize T

This passage has more to do with algebraic calculus than with number theory, so we’ll leave it to interested readers. The important thing for the sequel is only the final result:

We can observe that, by (9), if h is any function defined on the divisors of P(z):

If we set w_d := h(d), the left-hand side vanishes, so:

Our aim is to find a suitable function h in order to make this value of T, obtained by substituting w_d := h(d), as small as possible.

We said that \lambda_1 must be 1, but we can also get \lambda_1 from (10). Incidentally, the reason we saw how to calculate \lambda_ds from w_ds before actually calculating w_ds is that now we need formula (10), by which:

We have thus obtained a first constraint that must be satisfied by our w_ds. Comparing with (d), we can try to reduce the summation \sum_{d \mid P(z)} \phi(d) w_d h(d) to \sum_{d \mid P(z)} \mu(d) w_{d}; the simplest way to do this is to set h(d) := \frac{\mu(d)}{\phi(d)}. Let’s try to put this value into (d):

where in the last step we applied (e), furthermore we observed that \mu(d)^2 = 1 because \mu(d) = \pm 1, being d square free.

The value of T that we found is not good because for z \to +\infty it tends to -\infty, while clearly the quantity to estimate (the numbers that remain after the sieve) is positive.

To solve the problem we’ll call V(z) the last summation obtained and, instead of setting h(d) := \frac{\mu(d)}{\phi(d)} as before, we’ll divide by V(z) obtaining h(d) := \frac{\mu(d)}{\phi(d) V(z)}. Substituting this value into (d), with few changes compared to the steps seen before, we’ll obtain:

This value of T is satisfactory because it is positive. We do not exclude that other choices of w_d could generate even smaller T values, but Selberg’s sieve makes this choice, and we invite our readers to let us know if there are better choices (i.e. that minimize T without however increasing too much the error term R, which we’ll study in the next article).

## Final example

Let’s calculate T and the various \lambda_ds when P(z) = 2 \cdot 3 \cdot 5 = 30.

After various simplifications, we have seen that T can be expressed in the form of formula (9):

where in the last step we replaced the values of the \phi function calculated in the example of step 2.

Now we’ll calculate V(z), given by formula (12):

We can thus calculate the values of w_d, given by (10):

Substituting T into the formula, we’ll have:

So we can say that, according to Selberg’s sieve, the number of elements of the set A that remain after eliminating the multiples of 2, those of 3 and those of 5, is approximately \frac{4}{15} of the total (without taking into account the error term).

To conclude, we’ll calculate the values of \lambda_d using formula (10).

First let’s check that \lambda_1 = 1:

Let’s now calculate \lambda_2:

With similar steps we can get the values of the other \lambda_ds. Here is the complete list:

These values will be used in the next article to overestimate the error term R given by formula (2).

The values obtained for \lambda(d) correspond, in this case, to \mu(d). This always happens for Selberg’s sieve in the simplified form, as we’ll prove in a detail part of the next article.

Shows the calculation details of \lambda_3, \lambda_5, \lambda_6, \lambda_{10}, \lambda_{15}, \lambda_{30}

I’m glad to see that you’ve come around to the usefulness of a sieve function for proving Goldbach’s. It’s also good that you’ve searched out and found related theorems that apply. Do you still have the opinion that proof by a negative (reductio ad absurdum) is not valid? (that is, proof of Goldbach’s via sieve function).

Generally speaking, we believe in sieve theory as a good candidate for proving Goldbach’s Conjecture. Unfortunately, simplistic approaches are not valid, like those based on Erathostenes’ sieve, which shows up its limits when applied to Goldbach’s Conjecture, and generally to problems which involve counting prime numbers, as we explained in the following articles: Why don’t algorithmic approaches work well in sieve theory? (part I), Why don’t algorithmic approaches work well in sieve theory? (part II). However, we think that more sophisticated sieving techniques may be successful in proving the Conjecture. Regarding a reductio ad absurdum, out attempts move mostly towards a direct proof, but… who knows?