Prerequisites:

- Why don’t algorithmic approaches work well in sieve theory? (part I)
- The Möbius function and its connection with the function Λ
- Elements of asymptotic analysis
- The Prime Number Theorem: history and statement

In the previous article we calculated the sieve function of Erathostenes’ sieve starting from the algorithm, obtaining a formula with a number of terms that grows exponentially with respect to the number of primes used, therefore too complex to be used in practice. But ultimately it is not necessary to calculate a sieve function exactly: to be able to study it (for example to understand its asymptotic behavior) it would be sufficient to find a good approximation. Therefore, if we were content to approximate the sieve function, in the case of Eratosthenes’ sieve, would we obtain a less complicated formula? How good would the approximation be? In this article we will answer these questions.

## From Eratosthenes’ sieve algorithm to the calculation of the corresponding sieve function: a heuristic approach

As in the previous article, given an integer N \gt 1, we’ll start from the set A := \{1, 2, 3, \ldots, N\} and suppose we have run the sieve of Eratosthenes algorithm, in the version that erases all multiples of a prime p, including p itself.

We have seen that in the first step, i.e. when the algorithm eliminates the multiples of 2, in the set A exactly N - \left\lfloor \frac{N}{2} \right \rfloor elements remain. To approximate this function with a simpler one, the idea could be to eliminate the integer part, obtaining the formula N - \frac{N}{2}. By doing so, we would make an error of \frac{1}{2} at most, \frac{1}{2} being the maximum possible difference between \left\lfloor \frac{N}{2} \right\rfloor and \frac{N}{2}, which is obtained in the case of odd N (if N is even there is no difference).

Continuing the algorithm, i.e. eliminating also the multiples of 3, we saw that the exact formula becomes more complicated, because we must take into account the fact that some multiples of 3 have already been eliminated previously because they are also multiples of 2. In particular, we have seen that the number of elements left in A after eliminating the multiples of 2 and 3 is N - \left\lfloor \frac{N}{2} \right\rfloor - \left\lfloor \frac{N}{3} \right\rfloor + \left\lfloor \frac{N}{6} \right\rfloor. Removing the whole parts also in this case, we’ll obtain the formula N - \frac{N}{2} - \frac{N}{3} + \frac{N}{6}. Let’s try to simplify it:

Now we’ll eliminate the multiples of 5 too. The exact formula that calculates how many numbers remain is:

With the same technique as before, we’ll obtain the following approximate formula:

What do the approximate formulas \frac{N}{2}, \frac{N}{3} and \frac{4N}{15} have in common? The question becomes clearer if we write them differently:

So, for example, the fraction \frac{4}{15} of (1) is given by \left( 1 - \frac{1}{2} \right) \left( 1 - \frac{1}{3} \right) \left( 1 - \frac{1}{5} \right) = \frac{1}{2}\frac{2}{3} \frac{4}{5} = \frac{4}{15}.

We can interpret formulas (2), (3) and (4) as follows:

- In the first step, for formula (2), we eliminate approximately \frac{1}{2} elements (the multiples of 2) from the set.
- In the second step, by formula (3), we eliminate from the set approximately \frac{1}{3} of the elements remaining after the first step. In fact, formula (3) is obtained from (2) by multiplying by 1 - \frac{1}{3}, which means removing \frac{1}{3} compared to the previous quantity.
- In the third step, by formula (4), we eliminate from the set approximately \frac{1}{5} of the elements remaining after the second step (which corresponds to multiplying by 1 - \frac{1}{5}).

But \frac{1}{2}, \frac{1}{3} and \frac{1}{5} are nothing other than the approximate frequencies respectively of the multiples of 2, of the multiples of 3 and of the multiples of 5 in the initial set A. In fact, if we removed the multiples of 3 from A without first removing the multiples of 2, we would remove \left\lfloor \frac{N}{3} \right\rfloor elements, therefore approximately \frac{1}{3} of the total N. Similarly, if we removed the multiples of 5 from A without first removing the multiples of 2 and 3, we would remove \left\lfloor \frac{N}{5} \right\rfloor elements, so about \frac{1}{5} of the total. Therefore, if formulas (2), (3) and (4) were good approximations of the corresponding exact formulas, it would mean that: *the fraction of elements eliminated in each step of the sieve of Eratosthenes, when multiples of p are eliminated, is approximately equal to the fraction of multiples of p calculated on the initial set*. In other words, the multiples of 3 eliminated would be roughly \frac{1}{3} of the numbers not divisible by 2; the multiples of 5 eliminated would be roughly \frac{1}{5} of the numbers that are not divisible by either 2 or 3, and so on.

Now let’s imagine applying this approximation in *all* steps of the sieve of Eratosthenes, that is for all prime numbers from 2 to the largest prime number less than or equal to \sqrt{N}. We would obtain the following approximation of the sieve function:

But how good is this approximation?

We can understand this starting from numerical data. The following Figure shows the graph of S(A, \mathbb{P}, \sqrt{N}) superimposed on that of N \prod_{p \leq \sqrt{N}} \left( 1 - \frac{1}{p} \right) up to N = 10,000 (the two graphs look like continuous lines but in reality they are made up of a sequence of points, one for each N). The overlap is evident:

So, at least up to 10000, approximation (5) is almost perfect. In reality, however, formula (5) has two big problems:

- It’s difficult to estimate the error;
- From an asymptotic point of view, the estimate is incorrect.

Let’s analyze the two points in detail.

## Estimation of error

From the Legendre’s identity, we know that the number of elements left in A after applying the sieve of Eratosthenes is given by the formula:

Depending on the value of N (more precisely, depending on how many primes are less than or equal to \sqrt{N}), we know that from this general formula we can obtain the specific ones that we used as examples in the previous section: N - \left\lfloor \frac{N}{2} \right\rfloor, N - \left\lfloor \frac{N}{2} \right\rfloor - \left\lfloor \frac{N}{3} \right\rfloor + \left\lfloor \frac{N}{6} \right\rfloor, and so on. Our approximation consisted in removing the whole parts, so, in general terms, the approximate formula is the following:

In fact, the expression on the right of this formula is the generalization of the first line of formulas (2), (3) and (4), i.e. the simple sum of the terms of the general formula of the sieve of Eratosthenes without the whole parts. In formulas (2), (3) and (4), however, we did not stop here but, after having appropriately grouped the terms and highlighted them, in the last line we found some products of terms, which we then generalized with the expression to the right of formula (5). Formulas (6′) and (5) are therefore equivalent: the equality of their expressions on the right is a generalization of the steps seen in formulas (2), (3) and (4).

Equivalences of this type, between formulas with summations and formulas with productions, are quite frequent in number theory, so we believe it useful to propose a more formal proof of the equivalence between (6′) and (5), using a general technique:

Two ways to express the general formula of the sieve of Eratosthenes without the whole parts

For every integer N \gt 1:

where P(\sqrt{N+1}) indicates the product of the prime numbers smaller than \sqrt{N+1}.

This property can be proved by induction, but we prefer to propose a more intuitive proof. We will prove that all terms appearing on the right of the equality also appear on the left, and vice versa; this approach is correct because the number of terms is finite on both sides.

Let’s consider one term of the left expression. It will be of the type \mu(d) \frac{N}{d}, where d is the product of zero or more distinct prime numbers, all less than \sqrt{N+1}, therefore less than or equal to \sqrt{N}.

If d is the product of zero prime numbers, that is, if d = 1, then \mu(d) \frac{N}{d} = \mu (1) \cdot N = N. We also find this term on the right, being N = N \prod_{p \leq \sqrt{N}} 1 (we multiplied the first terms, equal to 1, of each binomial of the production).

Now suppose that d is the product of h \geq 1 prime numbers, that is, d = \prod_{i=1}^h q_i. Now let’s compare the h prime numbers that constitute the factorization of d with the set of *all* prime numbers less than or equal to \sqrt{N}. Let P be the set of prime numbers less than or equal to \sqrt{N} that do not coincide with any q_i:

So, clearly:

In fact, compared to the previous formula for d we are only multiplying several times by 1, once for each prime factor missing in the factorization of d. Now, however, the number of factors is equal to the number of primes less than or equal to \sqrt{N}, because each of them either coincides with a q_i, so it is considered in the first production, or belongs to P, therefore it is considered in the second one.

Doing the reciprocal, we have:

Now let’s consider the product \mu(d) \frac{1}{d}. Since h is the number of prime factors of d, \mu(d) is equal to 1 if h is even and to -1 if h is odd. Another way to calculate this number is to multiply -1 by itself h times; in fact by doing this we’ll obtain 1 every two multiplications and at the end a -1 will advance if h is odd (for example, if h = 5: \prod_{i= 1}^5 (-1) = (-1) \cdot (-1) \cdot (-1) \cdot (-1) \cdot (-1) = \left((-1) \cdot (-1 )\right) \cdot \left((-1) \cdot (-1)\right) \cdot (-1) = 1 \cdot 1 \cdot -1 = -1). So \mu(d) = \prod_{i=1}^h (-1). Substituting into the previous formula, we’ll obtain:

So:

But this, except for the order of the factors, is nothing more than a term of the right side expression. In fact, if we expand the production \prod_{p \leq \sqrt{N}} \left( 1 - \frac{1}{p} \right), we’ll obtain a sum of terms where each of them is a product of factors, one for each prime p \leq \sqrt{N}, each factor being equal to either 1 or - \frac{1}{p}. For example, for N = 9:

Returning to formula (a), every prime p \leq \sqrt{N} being by construction either coinciding with a q_i or belonging to P, the expression \prod_{i=1}^h \left(- \frac{1}{q_i}\right) \cdot \prod_{p \in P} 1 represents a product of factors, one for each prime p \leq \sqrt{N}, where each factor is equal to 1 or - \frac{1}{p}. What changes with respect to the development of the right part of (7) is the order of the factors, because in (a) we first wrote all the terms with the reciprocals of the factors of d, while each term of the right side of (7) uses the order of the production, which is the ascending order of the prime numbers. For example, if N = 9 and d = 3, 3 = q_1 being the only prime factor of d, in the left part of (7) we’ll have the term: \prod_{i=1}^h \left(- \frac{1}{q_i}\right) \cdot \prod_{p \in P} 1 = \left(- \frac{1}{3}\right) \cdot 1, while the corresponding right term will be 1 \cdot \left(- \frac{1}{3} \right), the second of (b). Of course, by the commutative property this is not a problem.

Finally, it is evident that the factor N appears, in (7), both on one side and on the other, therefore all the terms on the left contain the factor N as well as all those on the right. This completes the first part of the proof.

It remains to be shown that each term on the right side of (7) also appears on the left. The reasoning could be the opposite of the previous one, but we can avoid it by starting from the following considerations:

- The left side has 2^k terms, where k is the number of primes less than or equal to \sqrt{N}, by Property C.3.
- For each d of the left summation, the corresponding term \mu(d) \frac{N}{d} is different from those obtained with the other ds, so the terms on the left are all different from each other. This is quite evident: since N is constant, each different d generates a different \frac{N}{d} value. Furthermore, if d and d^{\prime} are two different divisors of P(\sqrt{N + 1}), since \frac{N}{d} and \frac{N}{d^{\prime}} are different, so are \mu(d) \frac {N}{d} and \mu(d^{\prime}) \frac{N}{d^{\prime}}, because multiplication by a value of the function \mu alters the sign but not the absolute value, and the absolute values of \frac{N}{d} and \frac{N}{d^{ \prime}} are different.
- The right side also has 2^k terms, because we have the product of k binomials and a constant.

For the first two points, in the left summation we have 2^k terms all different from each other, each of which, as we proved before, is also found on the right side (in the sense that the numerical value is the same, even if the term would be written differently if the production was expanded). So the right side also has 2^k terms that are all different from each other. But, for the third point, the right side has 2^k terms *in total*, so it is not possible that it contains an additional term that is not present in the left side. This completes the proof.

The equivalence between (6′) and (5) is important because each of the two formulas gives different information. Formula (5) is important because, as we saw in the previous section, it lends itself to an intuitive interpretation; on the other hand (6′) lends itself better to error estimation, as it is structurally identical to (6), that by Legendre’s identity provides the exact value to approximate.

Now let’s try to answer a further question: what absolute error can be made at most with the formula (6′), compared to (6)?

Let us remember that formula (6), by Property C.3 , has 2^k terms, where k is the number of primes less than or equal to \sqrt{N}. It’s easy to prove that, of these 2^k terms, half (therefore 2^{k-1}) are positive and the other half are negative. Clearly the same property applies to the formula (6′), which has the same structure as (6).

Why are the terms of (6) half positive and half negative?

This simple property can be proved in several elementary ways, for example by induction or by constructing a one-to-one correspondence between positive and negative terms. For educational purposes, however, we propose a proof based on the properties of the \mu function.

We have to prove that, among the divisors d of P(\sqrt{N+1}), half have positive \mu(d) (therefore equal to 1, which is the only positive value of the function \mu) and half have negative \mu(d) (therefore equal to -1). Recall that in this case \mu(d) is always different from zero because P(\sqrt{N+1}) is the product of distinct prime factors, so it is also d.

By Property N.8 (second form), we have that \sum_{d \mid P(\sqrt{N+1})} \mu(d) = 0. But in our case \mu(d) can only take on the values 1 and -1, therefore if the sum of all \mu(d) is zero it means that the number of 1s and the number of -1s is the same (if there were more 1s, the sum would be positive; if there were more -1s it would be negative). Therefore the number of positive terms is equal to the number of negative terms, i.e. both are equal to half the total.

Finally, for those who have followed our path on the Prime Number Theorem, we observe that the property we have proved is a direct consequence of Property N.8, according to which, given a positive natural number, among its square-free integer divisors, the quantity of those that are the product of an odd number of prime numbers is equal to the quantity of those that are the product of an even number of prime numbers.

Let’s see what error we make for each term when passing from (6) to (6′):

- if \mu(d) = 1, we are replacing \left\lfloor \frac{N}{d} \right\rfloor with \frac {N}{d}. Therefore, being \left\lfloor \frac{N}{d} \right\rfloor \leq \frac{N}{d} \lt \left\lfloor \frac{N}{d} \right\rfloor + 1, we are adding an amount between 0 (inclusive) and 1 (exclusive).
- if \mu(d) = -1, we are replacing - \left\lfloor \frac{N}{d} \right\rfloor with - \frac{N}{d}. Therefore, being - \left\lfloor \frac{N}{d} \right\rfloor - 1 \lt - \frac{N}{d} \leq - \left\lfloor \frac{N}{d} \right\rfloor, we are
*subtracting*an amount between 0 (inclusive) and 1 (exclusive).

Overall, if we applied this reasoning for all terms separately, considering the worst case both in excess and in deficiency, we would have:

- The maximum possible positive difference between the value of (6′) and that of (6) is less than 2^{k-1}: this would happen if an error less than 1 was made for all positive terms and a zero error for all negative terms.
- The maximum possible negative difference between the value of (6′) and that of (6) is less in absolute value than 2^{k-1}: this would happen if an error were made less than 1 in absolute value for all negative terms and a zero error for all positive terms.

In summary:

So the maximum error in absolute value is equal to:

So the absolute error increases exponentially with respect to k.

Since 2^{k - 1} soon becomes greater than N, which is the initial number of elements of the set A, such an error estimate is clearly unacceptable. In fact, remember that expression (6) calculates the number of elements remaining in A after the sieve of Eratosthenes, so it will obviously return a number between 0 and |A| = N; therefore any error estimate that gives a value greater than N is useless.

On the other hand, it is also true that in our error estimate we have considered the worst cases, which may not occur. In fact, typically the errors on the individual terms will be partly negative (for the negative terms) and partly positive (for the positive terms), so they will tend to compensate each other, lowering the total error in absolute value. The problem is that, although this is true, the terms of (6′) are highly dependent on each other, making it difficult to obtain an error estimate that takes these dependencies into account.

## Asymptotic incorrectness

In the previous section we talked about the absolute error that our approximate formula, (5) or equivalently (6′), makes with respect to (6). Let’s now analyze the relative error, i.e. the quantity:

Ideally this error should approach a limit as N increases, and this limit should be zero. In fact, if a limit did not exist, it would mean that our error does not have a stable trend as N increases, while, if the limit existed but was different from zero, it would mean that our estimate has the tendency to underestimate or overestimate the exact value. We would therefore like:

which is equivalent to:

Since (6) is positive, we have that \frac{|(6') - (6)|}{(6)} = \left|\frac{(6')}{(6) } - \frac{(6)}{(6)}\right| = \left|\frac{(6')}{(6)} - 1\right|. By (8), this quantity tends to zero, so it definitely becomes smaller than a fixed \epsilon \gt 0:

But this relation, by definition of limit, is equivalent to (8′).

So we are requiring formulas (6) and (6′) to be asymptotically equivalent, as it happens for example for the functions \pi(x) and \frac{x}{\log x}, which are asymptotically equivalent by the Prime Number Theorem.

The good news is that we can verify how things are, because, as we’ll see shortly, we know the asymptotic equivalents of both functions; that is, we know that (6) is asymptotically equivalent to a certain function h and that (6′) is asymptotically equivalent to another function k. Based on this, by the transitivity of asymptotic equivalence, we can state that (6) and (6′) are asymptotically equivalent if and only if these functions h and k are asymptotically equivalent.

Let’s see what an asymptotically equivalent function to (6) is.

By equality (6) and by definition of sieve function, we know that it computes the number of primes greater than \sqrt{N} and less than or equal to N. This number can be obtained by subtracting the number of primes less than or equal to \sqrt{N} from the number of primes less than or equal to N:

By the Prime Number Theorem we know that \pi(N) \sim \frac{N}{\log N} and \pi(\sqrt{N}) \sim \frac{\sqrt{N}}{\log \sqrt{N}}. However, it is not possible to automatically conclude, from this and from (9), that (6) \sim \frac{N}{\log N} - \frac{\sqrt{N}}{\log \sqrt {N}}, as if asymptotic equivalence were an equality. This step would require a non-trivial proof (we invite those interested readers to try) and, among other things, there are many cases of functions f, F, g and G such that f \sim F, g \sim G but f - g \nsim F - G (for example: x^2 - 2x \sim x^2, x^2 - x \sim x^2 but x^2 - 2x - (x^2 - x) = -x \nsim x^2 - x^2 = 0). We’ll therefore proceed differently, highlighting \pi(N):

Since we are dealing in this case with a ratio of functions rather than a difference, the definition of asymptotic equivalence makes it legitimate to apply the Prime Number Theorem to the numerator and denominator:

So, substituting into (10):

So, by definition of asymptotic equivalence:

Hence, taking (9) and applying the Prime Number Theorem, we’ll finally obtain:

Let’s now see what we can say about (6′).

To study its asymptotic order this time it is best to return to the equivalent form that we saw at the beginning, namely (5). In this way, in fact, we can apply the following Theorem, of which in this context we are only interested in the statement:

Mertens’ Third Theorem

where \gamma is the Euler’s constant .

Based on this Theorem and Property C.5:

Then we can conclude that:

So (6) and (6′) are *not* asymptotically equivalent, because the limit of their ratio is not 1, but 2 e^{-\gamma} \approx 1.1229. This means that, for sufficiently large numbers, (6′) and (5) tend to overestimate, by about 12%, the number of primes remaining in A after applying the sieve of Eratosthenes.

## Conclusions

In an attempt to approximate the number of elements remaining in A after applying the sieve of Eratosthenes, we obtained formula (5), which initially seemed like a good approximation, because it was supported by experimental data and by a good intuitive justification. However, we have seen that in reality it is far from being so, because it presents major problems both from the theoretical point of view, because it is difficult to estimate the absolute error, and from the practical point of view, because it tends to overestimate the real value by approximately 12%.

The lesson we can draw from this study is that it is better not to trust too much either in one’s intuitions or in experimental data: intuitions can be wrong and experimental data may only apparently confirm them. For the data to have real value, you should be sure that you have considered numbers large enough for the problem you are examining, but the problem is that it is often difficult to understand which numbers are large enough. Therefore, until the proof we seek are complete, it is always better to consider intuitions and data as simple guides for understanding the problem we are studying, always leaving a certain margin of doubt.